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PREFACE 

The Seventh Copper Mountain Conference on Multigrid Methods was held on April 
2-7, 1995, at Copper Mountain, Colorado, and was sponsored by NASA and the 
Department of Energy. The University of Colorado, Front Range Scientific Computations, 
Inc., and the Society for Industrial and Applied Mathematics provided organizational 
support for the conference. 

This document is a collection of many of the papers that were presented at the con- 
ference and thus represents the conference proceedings. NASA Langley has graciously 
provided printing of this book so that all of the papers could be presented in a single 
forum. Each paper was reviewed by a member of the conference organizing committee 
under the coordination of the editors. 

The multigrid discipline continues to expand and mature, as is evident from these 
proceedings. The vibrancy and diversity in this field are amply expressed in these 
important papers, and the collection clearly shows the continuing rapid growth of the 
use of multigrid acceleration techniques. 
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MULTIGRID HISTORY 

(At the awards ceremony of the conference, Achi Brandt presented the following 
history of multigrid. The reader should study the truths contained herein and revel in 
the humor.) 

The early history of multigrid has recently become a hot subject of research. An 
ancient multigrid code was uncovered during extensive excavations last year in northern 
Turkestan. Carbon tests indicate that this code has an efficiency of 5.1 on the Richter 
scale. Some researchers believe that the V cycle was practiced by the Neanderthals. 
The use of the Full Multigrid (FMG) algorithm was, however, unique to Homo sapiens and 
is one of the major reasons for their ultimate survival. Prototypes of two-grid algorithms 
predate the first hominids. Most historians agree that coarsening was, in fact, invented 
by the dinosaurs; however, coarse-to-fine grid transfers were unknown to them, which 
explains their extinction. 

Earlier geological findings include rich multilevel deposits that have been unearthed 
in several North American gold mines, and thick layers of old multigridders have been 
discovered at Copper Mountain. 

The artifacts at the northern Turkestan site indicate that an early form of residual 
weighting was already in widespread use before the middle Full Approximation Storage 
(FAS) period. When Copernicus first introduced line relaxation, it was banned by the 
Catholic church. Pope Pointus the Square decreed that mere mortals should not practice 
such nonlocal schemes. He feared this practice would lead humanity to incompleteness, 
in particular to the incomplete LU decomposition of the Dutch church. The advent of 
variational coarsening during the French Revolution marks the dawn of the modern era, 
which is quite familiar to us all. 


IX 



Page intentionally left blank 


CONTENTS 


PREFACE 

ORGANIZING COMMITTEE 

ATTENDEES 

MULTIGRID HISTORY . . . 


Part 1 


. iii 
. iv 
. . v 
. ix 


A Multigrid Algorithm for Immersed Interface Problems 1 

Loyce Adams 

Smoothers for Optimization Problems 15 

Eyal Arian and Shlomo Ta’asan 

Multigrid With Overlapping Patches 31 

Markus Berndt and Kristian Witsch 

First-Order System Least-Squares for the Navier-Stokes Equations .41 

P. Bochev, Z. Cai, T. A. Manteuffel, and S. F. McCormick 

MGLab: An Interactive Multigrid Environment 57 

James Bordner and Faisal Saied 

A Full Multi-Grid Method for the Solution of the Cell Vertex Finite Volume 

Cauchy-Riemann Equations 73 

A. Borzi, K. W. Morton, E. Suli, and M. Vanmaele 

Multilevel Algorithm for Atmospheric Data Assimilation 87 

Achi Brandt and Leonid Yu. Zaslavsky 

Effective Boundary Treatment for the Biharmonic Dirichlet Problem ...... 97 

A. Brandt and J. Dym 

Multigrid Acceleration of Time-Accurate DNS of Compressible Turbulent 

Flow ................................................ 1 09 

Jan Broeze, Bernard Geurts, Hans Kuerten, and Martin Streng 

First-Order System Least Squares for Velocity-Vorticity-Pressure Form of the 
Stokes Equations, With Application to Linear Elasticity . 123 

Zhiqiang Cai, Thomas A. Manteuffel, and Stephen F. McCormick 
First-Order System Least Squares for the Stokes Equations, With Application 
to Linear Elasticity ......... ..... 1 33 

Z. Cai, T. A. Manteuffel, and S. F. McCormick 
Towards an FVE-FAC Method for Determining Thermocapillary Effects on 

Weld Pool Shape 147 

David Can right and Van Emden Henson 


XI 


167 


Quasi-Optima! Schwarz Methods for the Conforming Spectral Element 
Discretization ...... aaon. ...... ............. a. ........ 

Mario Casarin 

Recent Development of Muitigrid Algorithms for Mixed and Nonconforming 

Methods for Second Order Elliptic Problems 183 

Zhangxin Chen and Richard E. Ewing 

Effective Numerical Methods for Solving Elliptic Problems in Strengthened 

Sobolev Spaces 199 

Eugene G. D’yakonov 

A Parallel Multilevel Spectral Element Scheme 213 

M. B. Davis and G. F. Carey 

Revenge of the Semicoarsening Frequency Decomposition Muitigrid 

Method 227 

J. E. Dendy, Jr. 

An Optimal Order Nonnested Mixed Muitigrid Method for Generalized Stokes 
Problems 241 

Qingping Deng 

A Note on Multigrid Theory for Non-Nested Grids and/or Quadrature .... 255 
C. C. Dougias, J. Douglas, Jr., and D. E. Fyfe 
The Effects of Dissipation and Coarse Grid Resolution for Muitigrid in Flow 
Problems 265 

Peter Eliasson and Bjorn Engquist 

Multigrid and Krylov Subspace Methods for the Discrete Stokes Equations 283 
Howard C. Elman 

A New Coarsening Operator for the Optimal Preconditioning of the Dual and 
Primal Domain Decomposition Methods: Application to Problems with 
Severe Coefficient Jumps 301 

Charbel Farhat and Daniel Rixen 

High Performance Parallel Multigrid Algorithms for Unstructured Grids ... 317 
Paul O. Frederickson 

A Cell-Centered Multigrid Algorithm for All Grid Sizes 327 

Thor Gjesdal 

Numerical Study of Multigrid Methods with Various Smoothers for the Elliptic 

Grid Generation Equations 339 

W. L Golik 

Some Aspects of Multigrid Methods on Non-Structured Meshes ........ 347 

H. Guillard and N. Marco 


xii 



Schwarz Methods: To Symmetrize or Not To Symmetrize 363 

Michael Holst and Stefan Vandewalle 

A Mixed Finite Volume Element Method for Flow Calculations In Porous 

MBaeiBaaaBBiaBBaBHaaaaBHaaBBiia aaiaaagiaaaBaaaBsaaBa a *3 


Jim E. Jones 

Implicit Extrapolation Methods for Variable Coefficient Problems 393 

M. Jung and U. Rude 

Part 2* 


A Pressure Based Multigrid Procedure for the Navier-Stokes Equations on 

Unstructured Grids 409 

R. Jyotsna and S. P. Vanka 

The Multigrid-Mask Numerical Method for Solution of incompressible 

Navier-Stokes Equations 425 

Hwar-Ching Ku and Aleksander S. Popel 
implementation of Hybrid V-Cycle Multilevel Methods for Mixed Finite 

Element Systems with Penalty 439 

Chen-Yao G. Lai 

A Conforming Multigrid Method for the Pure Traction Problem of Linear 

Elasticity: Mixed Formulation 455 

Chang-Ock Lee 

Multiple Scale Simulation for Transitionai and Turbulent Flow 473 

Chaoqun Liu and Zhining Liu 

A Note on Substructuring Preconditioning for Nonconforming Finite Element 

Approximations of Second Order Elliptic Problems 489 

Serguei Maliassov 

Convergence of a Substructuring Method With Lagrange Multipliers ..... 503 
Jan Mandel and Radek Tezaur 

A Systematic Solution Approach for Neutron Transport Problems in Diffusive 
Regimes ............................................. 5*1 9 

T. A. Manteuffel and K. J. Ressel 

First-Order System Least-Squares for Second-Order Elliptic Problems with 

Discontinuous Coefficients 535 

Thomas A. Manteuffel, Stephen F. McCormick, and Gerhard Starke 

On DGS Relaxation: The Stokes Problem 551 

A. J. Meir 

Multigrid Acceleration of Time-Accurate Navier-Stokes Calculations ..... 565 
N. Duane Melson and Mark D. Sanetrik 


’Part 2 is presented under separate cover. 


Multigrid Methods for Fuily Implicit Oil Reservoir Simulation . . 581 

J. Molenaar 

Coarsening Strategies for Unstructured Multigrid Techniques with 

Application to Anisotropic Problems 591 

E. Morano, D. J. Mavriplis, and V. Venkatakrishnan 

Preconditioning Operators on Unstructured Grids 607 

S. V. Nepomnyaschikh 

Multigrid Methods for EHL Problems 623 

Elyas Nurgat and Martin Berzins 

Multigrid and Krylov Subspace Methods for Transport Equations: Absorption 

a. .......... .................................... ^J3^7 

S. Oliveira 

Fast Multigrid Techniques in Total Variation-Based Image Reconstruction . 649 
Mary Ellen Oman 

A Multilevel Algorithm for the Solution of Second Order Elliptic Differential 
Equations on Sparse Grids 661 

Christoph Pflaum 

Error and Complexity Analysis for a Collocation-Grid-Projection Plus 
Precorrected-FFT Algorithm for Solving Potential Integral Equations with 

Laplace or Helmholtz Kernels 673 

J. R. Phillips 

Multigrid Techniques for Highly Indefinite Equations 689 

Yair Shapira 

A Genuinely Two-Dimensional Scheme for the Compressible Euler 
Equations ............................................ 7 9 7 

David Sidilkover 

Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order 
Elliptic Problems 721 

Petr Vanek, Jan Mandel, and Marian Brezina 
Krylov Subspace and Multigrid Methods Applied to the Incompressible 

Navier-Stokes Equations 737 

C. Vuik, P. Wesseling, and S. Zeng 

An Algebraic Multigrid Solver for Navier-Stokes Problems in the Discrete 
Second-Order Approximation 755 

R. Webster 

Multiple Coarse Grid Multigrid Methods for Solving Elliptic Problems .... 771 
Shengyou Xiao and David Young 


XIV 


New Nonlinear Multigrid Analysis 793 

Dexuan Xie 



Xiaoqing Zheng, Chaoqun Liu, Changming Liao, Zhining Liu, and Steve McCormick 


XV 


Page intentionally left blank 


A MULTIGRID ALGORITHM FOR IMMERSED INTERFACE PROBLEMS 

Loyce Adams 1 

Dept, of Applied Mathematics 
University of Washington 


SUMMARY 


Many physical problems involve interior interfaces across which the coefficients in the problem, 
the solution, its derivatives, the flux, or the source term may have jumps. These interior interfaces 
may or may not align with a underlying Cartesian grid. Zhilin Li, in his dissertation, showed how 
to discretize such elliptic problems using only a Cartesian grid and the known jump conditions to 
second order accuracy. In this paper, we describe how to apply the full multigrid algorithm in this 
context. In particular, the restriction, interpolation, and coarse grid problem will be described. 
Numerical results for several model problems are given to demonstrate that good rates can be 
obtained even when jumps in the coefficients are large and do not align with the grid. 

1. INTRODUCTION 

Many physical problems involve interior interfaces across which the coefficients in the problem, 
the solution, its derivatives, the flux, or the source term may have jumps. These interior interfaces 
may or may not align with a underlying Cartesian grid. As an example, single phase Darcy flow in 
porous media is governed by the equation V • (/3Vp) = 0 for the pressure p where (3 = k/ \ i with k 
the permeability and p the viscosity. If the medium has an interface across which the permeability 
varies, we know that [p] = 0 and [f3p n \ = 0 at this interface. Another example is Stokes flow where 
the interface is the boundary of a moving membrane or bubble, ([1], [2]). A more complicated 
problem is to model the blood flow in the human heart. Here the interface is the boundary of 
the heart. Peskin [3] solves for the velocity of the fluid in which the heart is immersed by solving 
the Navier-Stokes equations on a Cartesian grid with a delta function forcing term determined by 
the force the heart wall exerts on the fluid. It can be shown [3] that this singular source term in 
the Navier-Stokes equations leads to jumps in pressure and the derivatives of velocity across the 
interface, and is discretized by discrete delta functions and transfered to the nearby Cartesian grid 
points. The velocity of the fluid is then used to move the boundary of the heart to the next time. 
This procedure is called the immersed boundary method and seetns to be only first order accurate 
due to the way the force on the interface is spread to the Cartesian grid. 

Zhilin Li has recently developed an approach for discretizing elliptic problems with interior 
interfaces called the immersed interface method (IIM), ([4], [5]), which can handle both discontinuous 
coefficients and singular sources. The idea is to compute on a Cartesian grid only, as in Peskin’s 


1 This work was supported in part by the Scientific Computing Division of the National Center for Atmospheric 
Research, which is supported by NSF, and in part by Department of Energy grant DE-FG06-93ER25181 and NSF 
grant DMS-9303404. 
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immersed boundary method, but to find accurate discretization stencils by incorporating knowledge 
of where the interface is located and the known jumps in the solution there, rather than by smearing 
the force with a discrete delta function. Li showed that second-order accurate discretizations could 
be found for a wide class of problems. Of course, there are problems of physical interest where the 
jumps at the interface are not known a priori and must be solved for first before such an approach 
can be taken. Such problems and solution techniques are discussed in [6], but will not be the focus 
of this paper. 

The purpose of this paper is to describe how the full multigrid method (FMG) can be applied 
to the discrete equations that result from the IIM. Many authors have given efficient multigrid 
schemes for both symmetric and nonsymmetric systems of equations that arise from elliptic problems 
with discontinuous coefficients. A partial list includes [7], [8], [9], [10], [11], [12], [13], and [14]. 
For problems with discontinuous coefficients, care must be taken to devise a proper method of 
interpolation for the multigrid process. Much of the work done in this direction has assumed 
that any interfaces are aligned with the grid. However, Aaron Fogelson and James Keener have 
used multigrid schemes to solve non-aligned immersed interface problems for two-dimensional heat 
equations in regions with holes, and to solve for electrical potentials in cardiac tissues, [15]. 

One common approach is to use what is called operator-induced interpolation. That is, the 
stencil for the partial differential equation incorporates information about the jumps in the coef- 
ficients, and this stencil can be modified to produce a stencil for interpolation. Such an idea is 
found in [8] and [9] and has the advantage that explicit information about the interface need not 
be known directly. Black box multigrid can find out from the problem stencil how to interpolate. 
This approach presumably can be used when the interface does not align with the grid, assuming 
the problem was discretized accurately. In the future, we plan to try this approach in conjunction 
with an IIM discretization. 

Here, we present a different approach. Since our stencil for the problem comes from the IIM, 
we have all the information about the interface and the jumps there. In this paper, we show how to 
use this information to build an 0(h 2 ) accurate interpolation scheme. The results of this approach 
seem promising since V-cycle rates of .06 to .13 have been achieved. 

The paper is organized as follows. Section 2 gives an overview of the IIM. Section 3 describes our 
multigrid scheme with a derivation of the modified bilinear interpolation. Section 4 gives numerical 
results. Section 5 states the conclusions and avenues for further work. 

2. IMMERSED INTERFACE METHOD OVERVIEW 

In this section, we review the immersed interface method. Details can be found in [4] and 
[5]. The IIM provides a discretization of elliptic PDEs that is 0(h 2 ), where h is the uniform mesh 
spacing in both the x and y directions. Consider the problem 

(!) (j3u x ) x + (/3u y ) y = f{x,y) inD 

Mr = *>{*) 
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r 

Fig. 1 . 6-pt Stencil for Irregular Points 

[fiu n ] r = v(a) 

where boundary conditions on fi are given and T is the interface, across which the jump in the 
solution and flux are assumed known as functions of the arc length s. The stencil for a regular 
point (all points of the standard 5-point stencil are on the same side of the interface) is the usual 
0(h 2 ) approximation that uses the 5-point stencil for u values and its edge midpoints for /3 values. 
To discretize (1) at an irregular point, Li uses a sixth point stencil as shown in Figure 1 where * 
represents a point (x*, y*) on the immersed interface and looks for a formula at the center point of 
the form 

6 

(2) (fiu x ) x + (fiuy) y = J2 7 i«i - c + 0(h) 

l 



where u,- denotes the z-th point in the 6-point stencil, the 7 ;’s are the coefficients to be determined, 
and c is a correction term that can be computed once the 7 ,-’s are known. Requiring the truncation 
error in (2) to be 0(h) at the irregular points and the truncation error to be 0(h 2 ) at the regular 
points is sufficient to guarantee that a global error of 0(h 2 ) is achieved everywhere. 

Let £ and t] be the normal and tangential directions at the point (x*, y*) which are given by 

(3) £ = (x — x*)cos0 + (y — y*)sind 

r} = — (x — x*)sin0 + (y — y*)cos9. 


We then expand U{ about the point (x*, y*) on the interface after changing to the (£, ij) variables. 
That is, 


(4) 


Ui = u* + iiu\ -1- T]iU* + + 




where * means to take the + or - limiting value on the outside or inside of the interface, respectively. 
Then we have 12 unknown terms on the right hand side in (2) and 6 unknown 7 ,-’s. But, since we 
know the jumps from (1), the following jump conditions can be derived for the special case where 
/? = Pi n inside the interface and @ = P out outside. 
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1. u + = u + w 

2. u+ = u~ + w' 

3. uf = puj + v/(3 0Ut 

4. uj, = puj n + (1 - p)^"u~ + w'y" + v'/pout 

5. u+ n = u~ v + (1 - pWuJ + w " - W'v/pout 

6. = pu ^ + {p - l)Sl! H Uj + {p- 1 )u~ v + Vv/Pout - w" + [fl/Pout 

The variables w and v are functions of rj only, p = /?,-„/ f3 0 ut, the interface is described parametrically 
as £ = <5 (7/), and all variables in the conditions above have been evaluated at (£, rj) = (0,0) which 
corresponds to the * point on the interface. Next, we substitute these six conditions into (4) and 
then substitute (4) into (2) to get six equations in six unknowns for the 7;’s. Once these are found, c 
is determined from the 7,-’s and the jump conditions. The end result gives an 0(h 2 ) approximation 
to the exact solution u that satisfies f3(u xx + u yy ) = /. 

To use the IMM to generate the problem, the user must specify w , v, and [/] at control points 
(X, Y) along the interface. The program fits a cubic spline through X, Y, v, w, and [/] at these 
control points to define X(s), y(s), v(s), w(s), and [f](s) as functions of the arc length parameter 
s. The quantitites in the jump relations are then derivable from these functions. As part of the 
procedure, each grid point is typed as being inside, outside, or on the interface, as well as being 
regular or irregular. 

One advantage of this approach is that the same interface can be used on each grid of a multigrid 
routine. That is, as we refine the grid, we need not refine a grid representing the interface. It is 
sufficient to specify a relatively small number of control points, depending on the smoothness of 
the interface, in order to describe the interface with a spline. Of course, this procedure can not 
handle problems with interfaces that can not be well represented with a cubic spline. A future 
improvement to the implementation of the IIM would be to describe the immersed interface with 
a level set formulation. The coding involved would be reduced significantly and we plan to do this 
before we tackle problems with multiple interfaces. 

3. A FULL MULTIGRID SCHEME 

The result of the IIM is a discrete system of equations, A h u h = f h , on the finest grid with 
uniform mesh spacing h in each coordinate direction. The goal is to develop a multigrid strategy 
to solve this system quickly. Unlike the Black Box multigrid approach of [8] and [9] which uses 
operator-induced interpolation, we base our strategy on knowledge -of where the interface is located 
and the jumps there. We have not yet compared our approach to Dendy’s but we can claim to get 
fairly good multigrid rates with our approach for this class of problems. 

The basic components of full multigrid are the smoother, the restriction operator, the interpo- 
lation operator, and the coarse grid problem. We now describe what we choose for each. For all our 
test cases, point-rowwise Gauss-Seidel worked fine as the smoother. More complicated problems 
with larger jumps in the coefficients may require a more sophsicated smoother. The coarse grid 
problem, A 2h u 2h = f 2h was taken to be the output of the IIM method with mesh size 2 h. This 
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Fig. 2. Interpolation for u 


choice seems to limit the size of the coarsest grid to be 10 x 10 (h = .4) for problems with ratios 
An / Pout = 2000. It is possible to define A 2h to satisfy the Galerkin condition, A 2h = ll h A h I% h , 
but this has not been implemented yet. With the exception of the limitation in grid size described 
above, our definition of the coarse grid problem worked fine. 

The interpolation operator we used is a modified bilinear interpolation in the £, rj coordinates 
for grid cells that contain an interface. If the cell does not contain an interface, the interpolation 
reduces to ordinary bilinear interpolation. To interpolate to the fine grid point at the center of a 
coarse grid cell, we build a formula based on the corner values of this cell plus a correction term. 
To interpolate to the midpoint of a vertical(horizontal) edge we choose either the cell to the east 
or west (or north or south) and find a formula based on the corner values of this chosen cell plus 
a correction term. The cell choice depends on the location of the interface relative to the fine 
grid point for which we are seeking an interpolated value. For example, if one cell is regular (no 
interfaces crossing its boundary) it is preferred over the irregular cell. If both cells are irregular, an 
attempt is made to choose the one that will produce the most accurate value. 

To describe the scheme mathematically, we consider the chosen coarse grid cell shown in Figure 
2 where u,- are the four coarse grid values, u is a fine grid point whose value we wish to find, and * 
is a point (x*,y*) on an interface cutting through the cell. During the continuation phase of FMG, 
we look for a solution to u of the form 


( 5 ) 


u = J2 7« u « - c 


much in the same way as the 6-point stencil was found for the PDE in the IIM method. Again, let 
£,• and rji be the transformed variables given in (3) of the previous section, and expand each u,- and 
u about (x*,y*) on the interface using (4). Using the jump conditions given for the IIM method, 
we get the system Ay 7 = by for the 7,’s after equating the coefficients of u~, uj, u~, and uj v . The 
matrices A 7 and by are given below, 


S 




( 6 ) 


Ay 


1 

6 A 

7?1 + OCi£i7]iTi 

tiViPi 


1 

&P2 

t ? 2 + a 2 ^2V2 T 2 
i2f]2p2 


1 

izPz 

Vz + 03^3^373 
£ 3 ^ 3/73 


1 

£ 4/74 

7/4 + a^T]^ 
£ 4774/73 


( 7 ) 


1 

tP_ 

fj + a£^f 

Inp 


and the correction term c = ci — c where 


( 8 ) 


Cl = a,- 7i(u> + £,• v/ Pout + 77 iW 1 + £,-77 ,(u/W" + v'/ f3 out ) ) 

1 

c = a(w + £v/p out + fjw' + + v'//3 0Ut )). 


In the above equations, if the cell is regular, a,- = 0 and a = 0. If the cell is irregular, a; = 1 if the 
point ( X{ , yi) is outside the interface and a,- = 0 if it is inside or on the interface. Likewise, if the 
cell is irregular, a = 1 if the point (x,y) is outside the interface and a = 0 if it is inside or on the 
interface. If the point ( x yi) is outside the interface then pi = Pin/ Pout and pi = 1 if it is inside or 
on the interface. Likewise, p — Pin! Pout if the point (x, y) is outside the interface and p = 1 if it is 
inside or on the interface. Also r,- = (1 — pi)W", and f = (1 — p)'5". 

Upon examination of these equations, it can be seen that for each irregular coarse grid cell, 
we really are calculating two bilinear functions, each of the form u = a + fe£ + cr) + d^rj. Each 
function interpolates the coarse grid points on the respective side of the interface (4 conditions). 
In addition, the functions are such that the jump conditions [u]r, [/3u^]r, [u^r, and [Pu^ v ]r are 
satisfied at the interface point (x*, y*) (the remaining four conditions). Since the terms left off are 
0(h 2 ), the formula is 0(h 2 ) for u, relative to the true solution of the partial differential equation. 
Hence, these formulas should give good results if the second derivatives are not too large relative 
to the mesh spacing. 

During the V-cycle, we need to interpolate the error e 2h to the finer grid. The same approach 
could be used if we knew [e 2/l ]r and [/?e 2/l ] r at the interface. These are not known, but if the 
smoother is doing a good job, it makes sense to set these jumps to zero. Then the same 7 ,-’s that 
were calculated during the continuation phase for interpolating u are the proper values to use for 
interpolating the error as well. This approach works well in practice as seen in the results in the 
next section. 

We choose the restriction operator to be a multiple of the transpose of the interpolation operator 
just described. In particular, I^ h = .25(I$fi) T . In the case of regular cells, this reduces to full 


6 



weighting. For irregular cells, the stencil has a width of two grid cells in each direction, excluding 
other coarse grid points. The data structure used is a 5 x 5 stencil with other coarse grid connections 
set to zero. 

4. NUMERICAL RESULTS 

Several test problems were run using the full multigrid scheme described above. For each 
test problem, we use the notation \(a,b ) to denote that a pre- and b post- smooths were used in 
each V- cycle. More cycles than necessary to reach truncation error were taken for the purpose of 
studying the convergence to the solution of the discrete system. In all problems, about 3 V-cycles 
were sufficient to reach truncation level. In each Table, derr denotes the difference between the 
computed solution and the exact solution of the difference equations and res is the residual. The 
grid size given for each Table is that of the finest grid. In the Figures, err is the difference between 
the computed solution and the exact solution of the partial differential equation. 

Problem 1 

The domain 0 is the (—2, 2) x (—2, 2) square and the interface T is the unit circle. The problem 
is 


P{ u xx + u yy) — / 

Pin = -5, Pout = 1000., fi n = 2.0, font = 0 
Uin = x 2 + y 2 , u 0Ut = x 2 - y 2 

Mr = ~2t p 

[pu n \r = 2 pout{x\ - y\) - 2 p in 

Table 1 shows rates of each V-cycle to be .13 for both the discrete error and the residual for a 
2-level scheme on a 40 x 40 grid with 2-pre and 2-post smooths. Notice that the modified bilinear 
interpolation used in continuation gave a starting guess on the finest grid of .02. This is good since 
the mesh size on the finest grid is h = .1. 


Cycle 

||derr oo 

I res oo 

ratederr 

rate res 

Starting 

1-V 

.20 x 10" 1 
.23 x 10- 2 

.50 x 10 4 
.13 x 10 3 

.12 

.03 

2-V 

.26 x 10- 3 

.57 x 10 1 

.11 

.05 

3-V 

.34 x 10~ 4 

.76 x 10° 

.13 

.13 

4-V 

.44 x 10~ 5 

.96 x 10" 1 

.13 

.13 

5-V 

.56 x 10- 6 

.12 x lO” 1 

.13 

.13 

6-V 

.73 x lO" 7 

.16 x 10~ 2 

• 13 

.13 

7-V 

.94 x 10" 8 

.21 x 10“ 3 

.13 

.13 


Table 1. Problem 1: V(2,2), 40x40, 2-levels 


Table 2 gives 2-level V(4,4) results for Problem 1. Notice that the rates went down from .13 to .06. 
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Cycle 

HderrHoo 

llreslloo 

T dtC(f err 

T dtc res 

Starting 

1-V 

.20 x 10- 1 
.37 x 10~ 3 

.50 x 10 4 
.91 x 10 1 

.02 

.002 

2-V 

.25 x 10~ 4 

.34 x 10° 

.07 

.037 

3-V 

.14 x 10~ 5 

.20 x 10" 1 

.06 

.06 

4-V 

.93 x 10- 7 

.13 x 10“ 2 

.07 

.07 

5-V 

.60 x 10- 8 

.83 x 10- 4 

.06 

.06 

6-V 

.39 x 10~ 9 

.53 x 10~ 5 

.06 

.06 

7-V 

.25 x IQ- 10 

.34 x 10~ 6 

.06 

.06 


Table 2. Problem 1: V(4,4), 40x40, 2-levels 


Table 3 gives 3-level results for an 80 x 80 fine grid and V(4,4). Notice that we still get rates of .06 
with 3-levels. Also note that even though the level 2 problem was solved with only 1 V-cycle, the 
starting error for level 3 was .016. 


Cycle 

llderrllco 

res a, 

V dt^derr 

7* dt 

Starting 

1-V 

.16 x 10" 1 
.96 x 10~ 3 

.15 x 10 5 
.28 x 10 3 

.06 

.02 

2-V 

.15 x 10~ 4 

.28 x 10 1 

.02 

.01 

3-V 

.78 x 10~ 6 

.64 x 10- 1 

.05 

.02 

4-V 

.51 x lO -7 

.38 x lO" 2 

.06 

.06 

5-V 

.28 x 10~ 8 

.14 x lO” 3 

.05 

.04 

6-V 

.19 x 10~ 9 

.90 x 10~ 5 

.07 

.06 


Table 3. Problem 1: V(4,4), 80x80, 3-levels 


Figure 3 shows the computed solution for this problem with V(4,4) for an 80 x 80 fine grid after 7 
V-cycles. Notice the sharpness of the jump at the interface. Figure 4 shows the associated err. This 
error, 0(10“ 5 ) , is concentrated along the interface as expected since the truncation error is largest 
there. We note that the discrete error, derr, is O(10 -11 ) and is much smoother at the interface due 
to the multigrid smoothing. 


PROBLEM 2 

The problem domain O is the (—2,2) x (—2,2) square and the interface T is the unit circle. 
The problem is 


/3(u xx 4 " u yy) — f 

Pin = 1, Pout = 1000, f in = f out = 2000 
u in = 1000a: 2 , u out = x 2 
[u]r = — 999(a: 2 ) 
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Fig. 3. u for Problem 1. 



o o 

Fig. 4. err for Problem 1. 


[pu n ] r = o, [it„]r = -999(23?) 

Note that this problem has a jump in the normal derivative at the interface even though the jump 
in the flux is zero. Table 4 shows rates for a 2-level method with V(4,4) to be .03 for the discrete 
error and .06 for the residual. 
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Table 4. Problem 2: V(4,4), 40x40, 2- levels 


Of special note in Table 4 is the starting error produced by the modified bilinear interpolation during 
continuation. At first sight this error of 10 looks quite bad. But notice that u xx = 2000 for points 
inside the interface, and the term |(x — x*) 2 u xx that is not included in the bilinear interpolation is 
exactly 10. In fact, Figure 5 shows the starting error to be very sharp at the interface, reflecting the 
fact that the truncation error has a different constant for points inside and outside the interface. 
This is the best we can hope to accomplish with bilinear interpolation for this problem. We do not 
plot the solution and error for this problem since the graphs are quite similar to Problem 1 in that 
the jumps are captured very sharply. 
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Problem 3 


As an application we consider single-phase saturated flow governed by Darcy’s law, 


(9) u = ~/3Vp 

V-u = 0 

where u = (u,v) T is the velocity vector and /? = «//z with a discontinuous permeability at the 
interface. Such problems arise in groundwater flow and contaminant transport. Combining the 
equations above we get the elliptic equation 

(10) V • (0Vp) = 0 

Hr = 0 

\fiPn] r = 0 

for the pressure p. Equation (10) is then discretized with the IIM and solved using multigrid. The 
velocities of the flow are then determined from (9). 

A similar strategy that was used for modified bilinear interpolation can be used to devise an 
0(h) formula for p x and p y in cells with interfaces. One could also get 0(h) formulas by using one- 
sided differences on the correct side of the interface. If the pressures, p , are calculated by multigrid 
on a grid of size h , modified bilinear interpolation can be used to give p at cell-centers and edges 
on a grid of size h/ 2. Then the needed information is available to find derivatives to 0(h) at grid 
points of the h/2 grid. A more exact, though more expensive method, is to calculate pressures on 
a grid of size hj 2 for use in the derivative calculation on a grid of size h. This was the approach 
that was taken in the results that follow. 

Once derivatives are found, we solve the equation 


( 11 ) 


q t + u-Vq = 0 


for advection of a contaminant with concentration q. This is done with LeVeque’s Clawpack 
software on a uniform grid, ([16], [17]). For the test problem we take fi to be the (—2,2) x (—2,2) 
square and T to be the interface shown in Figure 6. On the square, p = 1 at the left boundary, 
p = 0 at the right boundary, and p y = 0 at the top and bottom boundaries. The permeabilities are 
(3 = 5 inside the interface and /3 = 1 outside the interface. Initially, q = 0 and at the left (inflow) 
boundary q — 1, and an 80 x 80 computational grid is used. 

Figure 6 shows the velocities that were determined by differencing the pressure that came 
out of the multigrid routine. Since /3 is larger inside the interface, the velocity should move the 
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q at time t = 0 




Fig. 7. Contours of q for Problem 3. 


contaminant quicker through this region than around it. This is what is observed at four times as 
shown in Figure 7. Our approach did give sharp results for the moving front of the contaminant 
even though the Clawpack routine used did not have knowledge of where the interface was located. 


5. CONCLUSIONS 


We have demonstrated that a full multigrid algorithm can be designed for interface problems 
where the jumps in coefficients, solution, derivatives, flux, or source term are not aligned with the 
underlying Cartesian grid. This algorithm correctly solves the fine grid problem generated by Li’s 
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IIM and hence gives a second-order accurate solution to the partial differential equation. 

The multigrid solution for Problem 1 with jumps in the coefficients /?, the solution, the flux, 
and the source term, was obtained at a rate of .13 using V(2,2) with 2 levels, .06 using V(4,4) with 
2-levels, and .06 using V(4,4) with 3-levels. For Problem 2, with a large jump in [u„], but [/3u n ] = 0, 
we obtained rates of .03 for errors and .06 for residuals using V(4,4) and 2-levels. 

In order to achieve such rates, a modified bilinear interpolation scheme that takes advantage 
of known jumps in the problem at the interface as well as knowledge of where the interface is 
located was developed. If the second derivatives in u (for continuation) or discrete error (for V- 
cycle) are not too big, this interpolation can be expected to give good results to 0{h 2 ). If a coarse 
grid cell is regular, then the modified interpolation reduces to ordinary bilinear interpolation, and 
restriction becomes full-weighting. For V-cycle interpolation, the assumption that [e]r = 0 and 
[yde n ]r = 0 seems to be a reasonable one since we achieved a factor of 7 to 10 improvement over the 
pre-smoothed result after doing coarse grid correction. 

This multigrid approach was used successfully to generate pressures from which velocities were 
obtained for the groundwater flow application in Problem 3. The contaminant was advected in 
this velocity field using a Clawpack routine that did not know about the location of the interface. 
Results showed that the contaminant front was very sharp. 

There are still many improvements that can be made or questions that should be answered. 
First, the coarse grid problems come directly from an immersed interface formulation on the given 
grid level, not from a Galerkin condition of the fine grid problem. It is possible that one could 
use even coarser grids if a Galerkin approach is used. In addition, a Galerkin formulation may 
be more amenable to different smoothing strategies than our approach and could be beneficial 
when more complicated problems are tackled. Second, we plan to compare this approach to the 
operator-induced interpolation approaches that others have taken. In particular, Dendy’s Black 
Box solver for nonsymmetric problems, [9], could take the 6-point stencil generated by the IIM 
and infer an interpolation strategy, as well as automatically determining the coarser grids without 
explicit knowledge of the interface. 
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Smoothers for Optimization Problems 


Eyal Arian* and Shlomo Ta’asan* 


Abstract 

We present a multigrid one-shot algorithm, and a smoothing analysis, for the numerical solu- 
tion of optimal control problems which are governed by an elliptic PDE. The analysis provides a 
simple tool to determine a smoothing minimization process which is essential for multigrid appli- 
cation. Numerical results include optimal control of boundary data using different discretization 
schemes and an optimal shape design problem in 2D with Dirichlet boundary conditions. 


1 Introduction 

In this work we use multigrid methods to accelerate the numerical solution of optimization problems 
governed by an elliptic PDE. The necessary conditions for a minimum are given as a set of three 
equations: state, costate and design. The state equation is a PDE which depends on the design 
variables. The costate equation is a PDE for the Lagrange multipliers and is of the same type as 
the state PDE. In an optimal shape design (OSD) problem the design variable is the position of the 
boundary therefore the design equation is defined only on the boundary. 

Based on the necessary conditions for the minimum, the gradient of the cost-function with respect 
to the discrete design variables is given by the residuals of the design equation (assuming that the 
residuals of the state and costate equations are zero). A gradient based algorithm can then be 
constructed by an iterative method which solves sequentially the state and the costate equations and 
then updates the design variables with the gradients. 

Multigrid (MG) methods can accelerate this process in various ways. In [1] A. Jameson used a 
MG cycle to solve the state and costate equations in an aerodynamic shape design problem. Later, 
a “one-shot” method was proposed by S. Ta’asan, [2], and applied to aerodynamic shape design by 
S. .Ta’asan, G. Kuruvila and M. D. Salas [3, 4], which uses a few coarse grids for the optimization 
process, where the design variables are restricted to a finite dimensional design space which correspond 
to smooth solutions. In [5, 6, 7] the MG one-shot method was extended to the infinite design space 
in which the design variables are updated on all levels as originally suggested by A. Brandt [8]. The> 
main difficulty there is to provide a minimization algorithm which smoothes the design variables. 
We present a simple Fourier analysis which estimates the smoothing of the minimization process and 
provides a tool to establish smoothers by preconditioning if needed. 

Numerical examples include a linear optimal boundary control problem using different discretiza- 
tion schemes and a non-linear optimal shape design problem using a body-fitted grid. Results are 
given in two dimensions. 

*ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23681 
^Dept. of Mathematics, Carnegie-Mellon University, Pittsburgh, PA 15213 
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2 The Problem 


We address two classes of optimization problems which are strongly related. One class is “optimal 
shape design” (OSD) problems where the shape of the domain, in which a PDE is solved, is the 
variable to be optimized with respect to the PDE solution. These are generally non-linear problems 
which arise in many applications. A simpler class of optimization problems is optimal control of 
boundary data in a fixed domain boundary value problem. These problems are related to OSD 
problems using the small disturbance approximation. In the following, formal definitions of the above 
are given. 

2.1 Optimal Shape Design 

Let O be a bounded set in M d , and let 0 be a close subdomain in O. The problem is to find an 
optimal domain fi* € O and a “state variable”, <f> <E L 2 (tt*), subject to the “ state equation”, such 
that a given cost function, F(Cl, <f>(£l)), defined on O x L 2 {0) will be minimized; 

minE(n, <f>(Cl)) (2.1) 

where (f> satisfies the following PDE 

{ L<f> — f on 
B<f> = 0 on 

where L (x € O) is an elliptic differential operator 
operator. 

An extensive cover of OSD theory and references can be found also in [9, 10, 11]. 


dn 

of order 2m defined on O and B a boundary 



2.2 The Small Disturbance Approximation 


Consider a solution (f> 0 of the state equation (2.2) in a domain 0 0 . Let T be part of the boundary, 
(depending on the problem), T C 3D, and consider a perturbation of the boundary position T with 
(see Fig.l): 

r(r') <— r(a/) + ^(a:')^®'). (2.3) 


where n(x') is the normal to the boundary. 

The perturbed boundary, r*s , defines a domain, 0 £ s, with a solution The solution is extended 
analytically to a neighborhood containing DU 0 £ a. The extension is denoted as the original function. 
The following are relations between quantities on the perturbed domain, fi £ a, and those on the original 
one: 

(2.4) 

(2.5) 


ife L = + + 0(£2) 


B(j>ea =B<j> + sB + eB a (<f>)a + 0(e 2 ) 

1 cq 1-0 1 0 A 0 


The second order terms in (2.4) and (2.5) can be neglected for sufficiently small e, depending on a. 
For example assume a is composed of a Fourier frequency £, a = e*^, then e should be smaller than 
the wavelength i.e. £ <C |- 
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Figure 1: Perturbation of the boundary by ea in the normal direction. 


2.2.1 The Small Disturbance Minimization Problem 


Relations (2.4) and (2.5) are used to reduce the OSD problem (2.1)-(2.2) to a minimization problem 
on a fixed domain with some unknown boundary data. In this problem 0 and are fixed, while 
u = a and p = <f> are the design and the state variables, respectively. The optimization problem is 
given by 


min F(p,u ) 


( 2 . 6 ) 


where <p satisfies 


{ . L u)(p = 0 on 

BMp = -B a (<f>)u on T. 


(2.7) 


2.3 Optimal Control of Boundary Data 

The following problem is a more general formulation of the minimization problem which arises when 
performing the small disturbance approximation to optimal shape design problems. 

Let (2 be a bounded open set of M d with smooth boundary F and let ^ be a real valued function on 
fl. Let U and W be Hilbert spaces of real valued functions which are defined on F and 0, respectively. 

The problem is to find the “design variable”, and the “state variable”, (j> G W, such that 

a given cost function, F(u,<j>(u)), defined on U x W, will be minimized. Here (f> satisfies an elliptic 
PDE which is defined on fi and will be referred to as the “analysis problem” or the “state equation”: 


min u€W F(u, (j>(u)) on T 
L(<j>, u) = 0 on (l 


( 2 . 8 ) 


3 Derivation of the Necessary Conditions for a Minimum 

We apply the adjoint method to the optimal boundary control problem (2.8). The variable space is 
enlarged by adding Lagrange multiplier functions or costate variables denoted by A. A Lagrangian is 
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defined to be the sum of the cost-function and a linear term in the costate variables which vanishes 
as the constraint equation is satisfied; 

E(4>, A, u) = F(u, <f>) - (A, L(<f>, u )) . (3.1) 

A perturbation of the Lagrangian with respect to all the variables independently, i.e., state, costate 
and design, results in a variation of the Lagrangian: 

<j) « — <f) + £<j) 

A 4- A + eA (3.2) 

u <— u + eu 

with A 6 i/2(0), u € U and e is a small real parameter. The variation of the Lagrange function, 
6E, in the first order approximation in £, is given in the following form: 

8E = -e (l u) A + F+) - e (A, L(<f >, «)) + e (fi, u) A + F u ) (3.3) 

where L*^ and L* u are the adjoint operators of Lj, and L u , respectively. The requirement that the first 
approximation terms vanish results in the necessary condition for a minimum which will be referred 
to as the state, the costate, and the design equations: 

State : L(<j>,u) = 0 

Costate : L £(^, u)A + u) = 0 (3.4) 

Design : u)A + F u (<f>,u) = 0. 

From here on we will use the notation A{u) for the design equation residual, i.e., 

A(u) = —L*(<f>(u), u)A(u) - F u (<f>(u), u ) (3.5) 


where <j>(u) and A(u) in (3.5) are solutions of the state and costate equations. 

The application of the adjoint method to optimal shape problems is performed in a similar manner 
[5, 7], 


4 Discretization 

When discretizing the problem it is possible either to derive the necessary conditions for a minimum 
in the continuous formulation and then discretize or to discretize the cost-function together with 
the state equation and then derive the discrete necessary conditions. In the latter case the discrete 
minimization problem is given by: 

miiv. F h (u h , 4> h ) on . . 

L h (<f> h ,u h ) = 0 onSl h . 

As the grid mesh size, h, goes to zero, solutions of both approaches should converge to the differential 
solution. However, for a finite mesh size, discretization and necessary conditions do not necessarily 
commute. The solutions of both should be within the discretization error of the differential solution. 
In this paper we used both strategies. In the optimal boundary control problem, the derivation of 
the necessary conditions for a minimum was done in the continuous space, and then these conditions 
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were discretized, while in the optimal shape problem the necessary conditions for a minimum were 
derived in the discrete space. The discrete state, costate and design equations axe: 

L h (<f> h ,u h ) = 0 on $l h 

lJ?{<l> h , u h )\ h + F}(<l> h , u h ) = 0 on fl h (4.2) 

+ F*(<j> h , u h ) = 0 on I* 

We define A h (u h ) similarly to (3.5). 

5 A Gradient Descent Algorithm 

If the state and costate equations are satisfied then the gradient of the cost-function with respect to 
the design variables is given by the residuals of the design equation (see [5, 6, 7]): 

V U F = .*(«). 

The following is a gradient descent minimization algorithm which follows immediately from the above. 

1. Start with an initial approximation for the design variables, Uq. 

2. Solve the state equation for <j ) h . 

3. Solve the costate equation for \ h . 

4. Compute the amplitude of the perturbation, j3 , with a line search, 
and update the design variables: «*«- «* + £ A h (u h ) . 

5. If the residuals of the state, the costate and the design 
equations are greater than some preassigned value, in L 2 norm, 
then go to 2; else stop. 

Note that steps 2, 3 and 5 consist of a global computation over the whole domain. 

The complexity of this algorithm is given by 0(M p N l ), where M is the number of design param- 
eters, N is the number of grid points, and p and l are integers which depend on the problem and the 
PDE solver which is used to solve the state and costate equations. For example, if a MG solver is 
used to solve the PDE then 1=1. 

6 Relaxation of the Design Variables in a Multigrid Cycle 

The Full Approximation Scheme (FAS) is used to represent the state, the costate and the design 
equations on coarser grids. On each level a relaxation is performed on the state, costate and design 
variables. The state and costate equations, which axe elliptic PDE, axe relaxed by a Gauss-Seidel or 
damped Jacobi relaxations. The design variables are relaxed by 

u h «- u h + /3 h F h A h (u h ), (6.1) 

where (3 h and T h are chosen to guarantee good smoothing for the design variables and where A h (u h ) 
axe the residuals of the design equation. The choice of F h is discussed in Sec.8. This step should be 
followed by an update of the state and costate solutions. The construction of fl h and T h is done so 
that the boundary data is updated with a high frequency dominated quantity. 
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6.1 Ellipticity and Computational Cost 

In elliptic systems a perturbation of the boundary condition with a Fourier mode e lwx has an expo- 
nentially decaying effect on the interior solution of the form e ~ a ^ y , where y is the distance from the 
boundary and ct(lo) is a positive monotonically increasing function of u for large |u;|, [12]. For the 
Laplace equation the decaying rate is given by e~^ y . Therefore, in an MG scheme it is preferable to 
perturb the boundary condition with only high frequency modes relative to the given level. In that 
case only local relaxations will be needed in order to update the solutions after each optimization 
step. Also in non-linear problems the line search procedure, which calculates the amplitude of the 
minimization step (/?), requires a trial perturbation of the boundary. As a result of the local effect of 
such a perturbation the computational cost of the minimization step is only 0(y/N) operations (in 
two dimensions). 

On the coarsest grid the relaxation of the design variables is given in Eqn.(6.1) with J- = I thus 
taking into account the lowest frequencies. In that case the state and costate PDE are solved over 
the whole domain. 


7 Smoothing Analysis 

The Fourier analysis is based on calculating the symbol of the transformation between errors in 
the design variables and residuals of the design equation. The analysis can be done either in the 
continuous or discrete level. In the following we present the continuous analysis. The advantage of 
performing the analysis at the continuous level is the elimination of the effect of specific discretization 
on the above transformation. One objective of the analysis is to determine if the problem is well posed 
in the sense that small changes in the residuals of the design equation correspond to small changes 
in the design variables. 

7.1 Reduction to the Boundary 

The analysis is done by considering the high frequency errors in the design variables in half space 
(Fig.2). Then with a standard procedure the problem in half space is reduced to the boundary [13]. 

We assume that the state and costate equations are satisfied when the design variables are updated. 
Another assumption is that in the vicinity of the boundary, the non-smooth errors can be analyzed 
using a half space geometry. This approximation is valid since in elliptic problems non-smooth Fourier 
modes decay exponentially into the interior. For simplicity the analysis is performed for a second 
order elliptic equation in a two dimensional space. 

Consider a two dimensional geometry where the x axis is parallel to the boundary and the y 
axis is in the normal direction (see Fig.2). We want to study the mapping from errors in the design 
variables to the residuals of the design equation. The errors of the state and costate variables satisfy 
a homogeneous equation in the interior. The state, the costate, and the design errors are given by 

H x , y) = f-o o <j>(w)e' ux e- a M y du>. 

A(x,y) = fZo A {u)e lu,x e~ a ^ v du (7.1) 

u(x) = I™ o u(u)e wx du 

where £, e tux e ~ a i u )y = o and cr(u>) > 0 (for the Laplace operator a(u ) = |u;|). By substituting these 
expressions into the boundary conditions of the state and costate error equations, we obtain relations 
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Figure 2: A vicinity of a point on the boundary is transformed into a half space 

geometry . 

between $(u>),A(u;) and u(ui). ^From the set of boundary conditions for the state and the costate 
equations and from the design equation (which is defined on the boundary) we can deduce a relation 
between the errors in the design variables and the residuals of the design equation; 

A(w) = (7.2) 

A 

T{uj) is the symbol of the Hessian of the cost function, F, subject to the PDE constraint. In this 
work we use this symbol to estimate the smoothing properties of the minimization procedure. If 
the symbol of the transformation T(u > ) is a monotonically decreasing function in u j then one expects 
that the relaxation of the design variable will not be a good smoother. On the other hand if T( u>) is 
a monotonically increasing function in u>, for large |w|, then high frequency errors in the shape are 
amplified in the residuals of the design equation and good smoothing of the minimization process is 
anticipated. Note that this analysis deals only with the high-frequencies. 

7.2 Analysis of Optimal Shape Design Problems 

The optimal shape problem is reduced of an optimal control of boundary data problem by the small 
disturbance approximation as explained in Sec. 2.2. However, in this case the resulting equations have 
variable coefficients and a more delicate analysis is required. This is done by freezing the coefficients, 
at a point xo, which is justified as long as the changes in the design variables are highly oscillatory 
compared to changes in the coefficients appearing in the small disturbance problem. As a result 
of such an analysis one obtains the transformation between errors in the shape variables and the 
residuals of the design equation in the neighborhood of x 0 : 

A(u,a 0 ) = f(co,a 0 )a(u>). (7.3) 

where do stands for a quantity which is computed at xq. 
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We are interested in the amplification factor of the error in the design variables as a result of the 
multigrid minimization process. The relations between the errors in the design variables before and 
after the relaxation are followed from Eqns.(6.1) and (7.2): 

(8.1) 

where the relaxation symbol R h {9 ) is given by: 

R h (9) = 1 + p h T h T h {9). (8.2) 

For multigrid purposes it is desirable that R h {9) has small values in the high frequency range (-| < 
\6\ < 7r). If this is the case, the relaxation will reduce effectively the high frequency errors of the 
design variables prior to restricting their values to the coarse grid. 


Choice of Preconditioner 

In some cases the relaxation without the use of a preconditioner, does not smooth the errors 
effectively for any choice of /3 h . In these cases preconditioning of the design residuals is required. If 
chosen properly the symbol T h {9)T h {9) is dominated by the high frequencies and a proper choice 
of (3 h will result in good smoothing. The preconditioner is particularly effective for problems in 
which the transformation T h {9) is a monotonically decreasing function which has small values in the 
high frequencies. In these cases the minimization process does not smooth the errors effectively, and 
therefore without the use of a proper preconditioner, high-frequency oscillatory errors in the design 
variables are slow to converge, and in some cases might result in the divergence of the algorithm. 
Computational experiments using preconditioning were reported in [5, 6]. 

Evaluation of the optimization step size j3 h 

In a multigrid cycle the relaxation should be effective mainly in the high frequency range. The 
relaxation parameter f3 h is chosen such that the maximum of |jR fc (0)| in the high frequencies will be 
minimal, that is, 

min max |1 + /3 h J rh T h (9)\. (8.3) 

f} h f<|0|<7T “ V / 1 \ ) 

One can show that if the symbol T h (9) does not change sign f3 h is given by 


= - 


{f h f h ) m in + {f h T h ) ri 


where (T h T h ) m i n and (T h T h ) max are the minimal and maximal values of J :h {9)T h {9) in the range 
(f < |#| < 7r). In most of the practical problems the symbol J rh T h (9) is monotonous, thus f3 h is given 

by 

h 2 

^ “ “^ fc (f)f fc (f) + ^(jr)r*(7T)' ^ 8 ‘ 5) 

In this way the size of the minimization step amplitude, f3 h , is found by Fourier analysis instead of 
using a line-search, thus reducing the computational cost of each optimization step. However, this 
was demonstrated in practice only for linear problems (see Sec. 9.1.2). 


9 Numerical Examples 

We give two numerical examples: an optimal control of Dirichlet data with a fixed geometry and 
an optimal shape design problem where the position of the boundary is the design variable. The 
purpose of the first example is to demonstrate the use of the smoothing analysis to choose the 
best discretization for a given problem given a choice between a few possibilities. It is shown both 
analytically and numerically that different discretizations result in different smoothing rates of the 
minimization process. The purpose of the optimal shape design example is to demonstrate the 
effectiveness of our method. 


9.1 Dirichlet Boundary Control Problem 

The minimization problem is defined by 

min / — /*(x)) dx + 77 f u 2 dx 

«(*) Jy= 1 y dn J y = 1 


(9.1) 


where 17 is a fixed non-negative parameter, f*(x ) is a given function and where <j) satisfies the state 
equation. The state equation is given by 

A <j> = f on Cl 

< <j> — u(x) on y = 1 (9-2) 

_ (f> = (j>o on y = 0 

where 0 = {0 < a: < 1 ; 0 < t/ < 1} and periodicity is assumed in the x direction. The costate 
equation is given by 

AA = 0 on 

' A + 2(f£ -/*(*)) = 0 on j, = 1 (9.3) 

A = 0 on y = 0 

The design equation is given by 

a \ 

A = -5 2ij<f> = 0 on y = 1. (9.4) 

on 


9.1.1 Discretization 

The state, the costate, and the design equations are discretized in four different ways. In three 
discretizations all the unknowns are defined on the vertices of the grid lines as shown in Fig.3A (will 
be referred to as the “vertex grid”). The control variables are defined on the intersections of the 
grid points with the boundary. The normal derivative in the cost function was approximated with 
a first (VX1), a second (VX2) order approximation, and with the use of virtual points outside the 
domain (VX3). The fourth discretization is cell-centered (CC), where the variables are defined on the 
centers of the grid cells as shown in Fig.3B. The grid is extended out of the domain and virtual cell 
centered points are defined neighboring (exterior of) the domain. A Dirichlet boundary condition is 
given for the average of the variables neighboring the boundary. The design variables are defined on 
the centers of ^he segments connecting the intersection of the grid with the boundary. Note that in 
the multigrid scheme, the vertices of the grids on different scales are nested while in the cell-center 
case the cells are nested. 


23 




Figure 3: Vertex (A) and cell centered (B) grids. 


The different approximations for the normal derivative on the boundary are: 
1) A first order approximation for the normal derivative 


VX1 : 


d<j> fa# - 


2) A second order approximation for the normal derivative 


VX2 : 


+ 2^,2 — 


3) A use of a virtual point out of the domain (where its value is determined with the application 
of the interior operator on the boundary) 


VX3 


4) A cell centered discretization 


d(j> _ <f>i, 1 ~ 

dm 2 h 


d<j> ~ 4>i,-h 


9.1.2 Analysis: Reduction to the Boundary 

In the following the design equation for the Dirichlet boundary control problem is analyzed in the 
discrete space. The second order finite difference approximation of the Laplacian (which was used in 
the numerical experiments) is given by 


A = — 1 -4 1 . 


The term e cr< '°\ which is the discrete analog of cr(u) in Eqn.(7.1), satisfies the following second order 
equation 

+ (-4 + 2 cos 6) + e ~ a(e) = 0. (9.10) 

In order to calculate the Fourier symbol of the design equation (9.4), the symbol of the normal 
derivatives (9.5)-(9.8) is given first. 

The Fourier Symbol of the Normal Derivatives 


Qh - 1 

™ 1: ^ = —r- 


(9.11) 
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VX2 : 


(9.12) 


VX3 


d h 

dn 

Qh 

dn 


(«) 


(0) 


d h 

CC ■ Tn^ 


_I e -2-W + 2e - ff W - f 
h 

e -<r(0) _ e <r(8) 

2 h 

e -f<K0) _ e H fl ) 


(9.13) 

(9.14) 


The Fourier Symbol of the Design Equation 

' * A , 

In terms of the normal derivatives, the transformation T h (9) (see Eqn.(9.4)) is given by 

T h W = - 2 l{^0)Y+’l 


(9.15) 


As the parameter tj increases the weight of the low frequencies is increased relative to the high 
frequencies. 

The amplitude of the minimization step, fl h , given in Eqn.(8.5) is reduced to 


1 


(£(?)) +(iW) +2? 


(9.16) 


In Fig.5 the relaxation symbol R h (6) = 1 + f3 h T h (6) is plotted for the above four discretizations. For 
all four discretizations the relaxation reduces the high frequency errors by a factor smaller than 0.5. 

Fig.6 depicts the maximal eigenvalue, |A| max , of the two level convergence matrix as a function of 
the number of minimization steps, i/, on a given level. The factor by which the error is reduced as a 
result of a two level multigrid cycle is bounded by |A| mo;r . It is implied by Fig.6 that the cell-centered 
(CC) and second order vertex (VX2) schemes are expected to have a better performance than the 
other vertex schemes. 


9.1.3 Convergence Performance 

In the numerical tests the problems (9.1)-(9.2) were solved for the four discretizations (9.5)-(9.8). In 
all of these problems there was no need to use a preconditioner, J-, since the transformation T h ($) is 
dominated by the high frequencies. The minimization step amplitude, /3 h , given by Eqn.(9.16) was 
used in the computations. The multigrid one shot algorithm was tested using between two and seven 
levels (Fig. 4). In all the tests the residuals of the state, the costate and the design equations were 
computed in L 2 norm. 

In the two levels test (table 1), the finest grid was composed of 2 7 x 2 7 grid points and the coarsest 
grid was composed of 2 6 x 2 6 grid points. The parameter rj in the cost function (9.1) was set to zero. 
In table 1 the two level analysis and the actual convergence rates are compared for the four different 
discretizations. The agreement between the predicted and actual convergence is well apparent. 

In the multilevel test the fine grid was composed of 2 TO x 2 m points, with m = 5, 6, 7, and the 
coarsest grid was composed of 2 x 2 grid points. The tests with different choices of m were done in 
order to check if the algorithm is mesh-size dependent. All the results in Fig.4 were performed with 
a cell-centered discretization. Since the case rj = 0 in (9.1) corresponds to a trivial minimization 
problem, the case r) = 1 was tested, although in principle these cases are not different. 

In all problems the error was reduced in each V-cycle by an order of magnitude, where each 
V-cycle has a computation complexity of 0(N ) operations. 
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9.2 An Optimal Shape Design Problem in 2D 

Problem Definition 

The minimization problem is defined by 

min f (—■ — f*(x )) dx (9-17) 

r(*)Jr(x) K dy J K ’) K ’ 

with f*(x ) a given target function and where </> satisfies the state equation. The state and costate 
equations are given by 

( A (j> = / on 0 < ar < 1 ; T(a:) < y < 1 

l <t> = g{x) on y = 1 (9.18) 

[ <t> = <l > o on y - T(a;). 

The coordinates of grid points on the boundary T = T /l (x) define the design variables. The initial 
approximation for the shape, on the coarsest grid, was a flat surface: T^a;) = 0. 

The costate equation 


' AA = 0 
< A = 0 

. A + 2cos 2 «(f -/*) = 0 


on 0 < x < 1 ; r(ar) < y < 1 
on y = 1 
on y = T(a:) 


The design equation 

The design equation is simplified by using the costate boundary condition yielding 


A{<j>, A, 9) = 0 on y = T(x) 


(9.19) 


(9.20) 


where 


A(cf>, \,9) = — 


d<j) d 

dy do 


(A tan 9 ) 


A d 2 <f> d$d\ 
cos 8 dy 2 dy dn 


(9.21) 


Smoothing Analysis 

The smoothing analysis was done by first performing the small disturbance approximation result- 
ing in a fixed domain minimization problem (see Sec. 2.2) and then reducing the problem to the 
boundary as is explained in Sec. 7. The result is given by the following mapping: 


T(u) 




cos 9 0 \i sin(20o)<^|u;| + cos(20 o )M 2 + 0(u) 


For large u> there exists a positive constant C such that 


|TH| > C|uf 


(9.22) 


thus high frequency errors in the shape are amplified in the residuals of the design equation. It is 
this type of problems which is difficult to solve numerically by a single grid algorithm and for which 
multigrid is an ideal accelerator. 

Convergence Performance 

The numerical test was done with one application of an FMG algorithm with 2 preliminary cycles, 
4 optimization cycles per level and 10 relaxations on the coarsest level (the cycle which was used is 
W{ 2,1)). The line search uses 10 local relaxations which are performed on the four adjacent to the 
boundary grid lines (on all levels). The depicted residuals are the final residuals. 
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Tables 2 and 3 give the convergence rate of the OSD Dirichlet problem for different mesh-sizes and 
different curvatures of the target geometry: table 2 corresponds to the case f*(x) = 0.05 sin(27ra;) and 
table 3 to the case f*(x ) = O.2e _30 ^ _0 ' 5 ) . In both tables r x , r p and r u correspond to the residuals of 
the state, the costate and the design equations, respectively, and u — u exact is the error in the design 
variables. In both cases the initial geometry was a flat boundary (T(x) = 0). The results show a 
mesh-size independent convergence rate for both cases. 

Numerical experiments show that one application of a FMG scheme, with four cycles per level, 
was enough to reach the discretization error on all levels. 

A Note on the Cell Centered Finite Volume Discretization 

In problem (9.17)-(9.18), using a cell centered discretization, the transformation T h (6) vanishes 
for the highest frequency, 0 = 7r, resulting in high frequency errors in the shape variables. Therefore 
we argue that for this problem a vertex grid is a preferable discretization. 

Consider a flat boundary (x axis) and a boundary perturbation in the y direction of the form 
IY = e(— 1)* where the index i stands for the zth point on the boundary, and e is some number 
smaller than the mesh size. As a result of such a perturbation the cell center position will not change 
since the cell center coordinate is the average of the vertices coordinates. Therefore the solutions 
of the state, costate and design equations will not detect the perturbation. In order to avoid the 
oscillatory errors from entering the boundary a penalty on the cost function, or a preconditioner on 
the design equation residuals, should be applied. 


A. Optimal Control Problem, Jj = 0 

0 
-2 
-4 

Log(Res) 

-6 
-8 
-10 

0 2 4 6 8 10 

Vcycle 



B. Optimal Control Problem, T] = 1 



Figure 4: Convergence rates. A and B depict the Dirichlet boundary control problem 

with rj = 0 and rj = 1 respectively. The depicted residuals in A and B are the 
average of the computed state, costate, and design equations residuals. 
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Figure 5: The symbol of the design variable relaxation for the Dirichlet boundary 

control problem with rj = 0. Each curve represents a different discretization. 



v 


Figure 6 : Two level analysis of asymptotic convergence rates, |A| maa; , as a function of 
the number of optimization steps, u, for 77 = 0. Each symbol represents a different 
discretization. 
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DDDHDDHD 


1 v 

TLA NUM 

TLA NUM 

TLA NUM 


III 

0.327 0.320 

0.166 0.166 

0.454 0.439 

0.333 0.276 

ID 

0.264 0.255 

0.153 0.152 

0.283 0.278 

0.111 0.105 

ID 

0.218 0.188 

0.120 0.100 

0.236 0.229 

0.078 0.067 

4 

0.189 0.181 

0.101 0.080 

0.206 0.202 

0.061 0.035 


Table 1: Two Level Analysis (TLA) versus tested (NUM) convergence rates for the 
optimal control of Dirichlet data problem, for various number of optimization 
steps, v, on the fine level. 


level 

r x 2 

■m 

mmm 

U Inexact 2 

2 

0.103e— 15 

0.872e— 16 

0.676e— 07 

0.142e— 00 

3 

0.135e— 03 

0.713e— 03 

0.504e— 03 

0.674e— 01 

4 

0.431e— 04 

0.283e— 03 

0.249e— 03 

0.326e— 01 

5 

0.128e— 04 

0.429e— 04 

0.855e— 04 

0.157e— 01 | 

6 

0.258e— 05 

0.967e— 05 

0.322e— 04 

0.744e— 02 

7 

0.445e— 06 

0.217e— 05 

0.123e— 04 

0.374e— 02 


Table 2: Convergence rates for the optimal shape design problem with a target 

distribution given by f*(x) = 0.05 sin(27rx) 


level 

— 

■HID 

DlEHIBW 


2 

0.349e— 15 

0.757e— 16 

0.218e— 07 

0.318e— 01 

3 

0.673e— 03 

0.901e— 03 

0.713e— 03 

0.505e— 01 

4 

0.281e— 03 

0.112e— 02 

0.334e— 02 

0.371e— 01 

5 

0.337e— 04 

0.286e— 03 

O.lOOe— 02 

0.216e— 01 

6 

0.156e— 04 

0.156e— 03 

0.757e— 03 

0.655e— 02 

IB 

0.165e— 05 

0.335e— 04 

0.485e— 03 

0.381e— 02 


Table 3: Convergence rates for the optimal shape design problem with a target 

distribution given by f*(x) = O^e -30 ^ -0 - 5 ) 2 
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SUMMARY 


Solving boundary value problems with optimal efficiency requires adaptivity and multilevel tech- 
niques. In [6] an implementation of the AFACx algorithm (cf. [8]) is presented that is based on 
rectangular Cartesian grids. This implementation does not allow for the overlap of grids that lie 
on the same level of refinement. We investigate the case in which these grids overlap. A standard 
technique for overlapping grids is the Schwarz algorithm (cf. [12] and [13]). Some ways of using 
the Schwarz algorithm in a standard multigrid scheme are presented. Also, a problem that arises 
in some situations with non-aligned, overlapping grids is described. This situation comes up in 
a natural way when the Schwarz algorithm is used as a relaxation scheme within a multilevel 
algorithm. We identify the reason for the bad convergence and show that by more sophisticated 
interpolation the difficulties can be overcome. Then we present a multiplicative Schwarz algo- 
rithm for a large number of grids that has a high potential for parallelization. Finally we give 
some numerical results for the FACx algorithm with overlapping grids on each refinement level. 
The implementation of the described codes uses C++ and the array class libraries A++ and P++ 
(cf. [4], [5], and [11]). Using the A++/P++ programming environment, it was possible to move 
from a serial code to a parallel code within a few days. 
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INTRODUCTION 


The Schwarz algorithm is a useful tool when it comes to adaptive refinement. The imple- 
mentation of these complex algorithms can be kept simple by using regular grid structures for 
the discretization. However, any sophisticated refinement strategy will yield highly irregular re- 
finement regions. In [5] and [10] an implementation of the AFACx algorithm is presented. This 
implementation is based on block structured refinement grids that consist of non-overlapping reg- 
ular Cartesian grids. There are situations where overlapping grids have advantages, since simpler 
grids or substantially fewer blocks can be used. One example is the use of boundary aligned grids 
along the boundary and a Cartesian grid in the interior of a domain as it has been used by Lin- 
den ( [7]) and Chesshire and Henshaw ([3]). Complicated grids have to be constructed without 
overlap. Another example is the refinement along a. shock with d cells orthogonal to the shock. If 
rotated Cartesian overlapping grids are used, a small number of blocks (depending on the curva- 
ture) is sufficient. Using non-overlapping Cartesian grids, the number of blocks is proportional to 
I which also introduces much overhead. Therefore we investigate the use of overlapping grids and 
appropriate solution methods. 


THE SCHWARZ ALGORITHM ON TWO RECTANGULAR GRIDS 


The Classical Schwarz Algorithm 


Given a discrete problem as it arises from an elliptic partial differential equation, on two 
rectangular overlapping grids fl a h and tt b h we have the following 


LhUh - fh in tt a h fl A 
Uh=9h on d(fi a h nfi b k ). 


We define the well-known Schwarz algorithm ([12]) as follows: 


multiplicative (algorithm 1-m) 


1. 

initialize u? and u\ 



2. 

u h 

- MG(Ll, 


)in 


3. 

< 

«- 7 M on 

3(1“ n 

A 


4. 

< 

<- mg(lI 

v b f b 

u hi Jh 

)in 

ni 

5. 

u l 

*- HA on 

dft b n 

A 


6. 

go 

to step 2 





additive (algorithm 1-a) 

1. initialize ul and u b h 

2. u a h <- MG{L a h ,u a h J%) in f l % 

3. u\ «- MG(L b h ,u b h Jl ) in Q b h 

4 - A <- 7 M on n A 

5- u l <- HA on dft 6 n ti a h 

6. go to step 2 


where MG ( , , ) is an approximative solver on a rectangular grid (we use a multilevel V(2,l) cycle). 
Numerical examples show that the convergence rates of both algorithms relate to each other as 


P*dd \J P mul i * 


32 



Thus according to the convergence behavior algorithm 1-m is two times faster than algorithm 1-a. 
On the other hand, in a parallel environment algorithm 1-a might turn out to be the more efficient 
one due to its inherent parallel potential; that is, the solution step (step 2 and 3) can be performed 
simultaneously. For algorithm 1-m, only parallelism in the MG solver (grid partitioning) can be 
exploited. 


The Schwarz Algorithm as a Smoother 


Similarly to algorithms 1-m and 1-a we can 
following way: 

multiplicative (algorithm 2-m) 

1. u a h «- R k (L a h ,u a h J£) in fll 

2. u b h *- I b a u a h on dfl a D 

3. ui<-R k (LK,ft) '■> Ot 

4. u a h <— I£u b h on dO b fl 0% 

5. go to step 1 


define a Schwarz-like relaxation scheme in the 


additive (algorithm 2-a) 

1. ul^R k (L a h ,u a h J a h ) in 0% 

2. u b h ^R k (Llu b h J b ) in ft* 

3. u b h <— I b u a h on dO a fl fl b h 

4. u a h *- I%u b h on dn b n n a h 

5. go to step 1 


where R (, , ) is a given relaxation scheme on the rectangular grids (e.g., Gauss Seidel). 


Table 1 shows numerical results for the following test problem: Laplace’s equation on a rect- 
angular domain that consists of two overlapping grids of 65 x 65 points each. Here we use the 
standard 5 point stencil discretization, fh = 0 and gh = 0. For the algorithms 1-m and 1-a we use 
a V(2,l) multigrid cycle as an approximative solver. For the algorithms 2-m and 2-a we use k — 1 
and a V(2,l) multigrid cycle on the whole domain. This means that we do standard coarsening 
on each of the grids and treat the two overlapping grids on each coarsening level as one level in 
the multigrid sense. 


Table 1: Convergence Rates and Overlap Geometry 


ovl 

1-m 

1-a 

2-m 

2-a 

2 h 

0.807 

0.899 

0.606 

0.788 

4 h 

0.655 

0.816 

0.348 

0.615 

8 h 

0.432 

0.667 

0.154 

0.384 

16 A. 

0.199 

0.451 

0.092 

0.169 

32 h 

0.057 

0.211 

0.049 

0.051 



For small overlap areas we observe that algorithm 2-m has the best convergence rate. Algorithm 
2-a is comparable to algorithm 1-m, but due to the additivity it has a higher parallel potential. 
Similar results hold for other overlap geometries ([1]). 


A major disadvantage of the Schwarz-like relaxation scheme is the fact that on coarser levels 
a bad situation of overlap occurs in a natural way. Given two fine-level overlapping grids of the 
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same mesh size that are aligned, the situation illustrated in figure 1 is very likely to occur. At 
the re-entrant corner in both the x- and the y-direction, the distance of the boundary of the grid 
to the closest parallel interior grid line of the other grid which goes through the interior is small 
compared with the mesh size. We observe a strong coupling between those two grid points that lie 
on an interior boundary closest to the physical boundary of the domain and, thus, a convergence 
rate close to 1. 



Figure 1: Bad Overlap Geometry 


An Example for the Case of a Bad Overlap Geometry 


The following example illustrates the situation described above. We consider the simple case of 
two overlapping grids of size 3x3 grid points. We discretize Poisson’s equation on the union of the 
two grids by using the standard 5 point stencil on each of the grids. We apply the multiplicative 
Schwarz algorithm with an exact solver (steps 2 and 4 in algorithm 1-m). The transfer of boundary 
values (steps 3 and 5 of algorithm 1-m) is done by bilinear interpolation. 

The spectral radius of the multiplicative Schwarz algorithm Ms chW arz depends on the mesh size 
h and the smallest distance d between the lines of the two grids: 

... , ( h-d ) 4 

/>(M Schwarz ) = h4 ■ 

For d \ 0, clearly p(A'l Schwarz) /* 1) so this example suggests that the multiplicative Schwarz 
algorithm is ill conditioned in the case of such overlap geometry. If we use quadratic interpolation 
in the x- and y-directions we observe the same behavior. Numerical experiments with various grid 
sizes lead to the same result. 


An Approach to Improve the Convergence in the Case of a Bad Overlap Geometry 


If we modify the interpolation used to transfer the interior boundaries near the physical bound- 
ary of the domain, we can overcome the bad convergence behavior that arises from a bad overlap 
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geometry. The bad convergence rate is due to the strong coupling between the two points that 
are closest to the re-entrant corner. As illustrated in figure 2 for the case of linear interpolation, 
we use p2 and p3 to interpolate the value on pi. Here the value on p2 is obtained by linear 
interpolation from the values on p4 and p5. The value on p3, in contrast to the interpolation used 
in the interior of the domain, is an exact boundary value, or it can be obtained by using the value 
on the closest grid-point on the physical boundary. 


p4 


P3: 

— ¥~ 

pld>* 


-*-A- 


P 2: 






p5 



Figure 2: Interpolation of Boundary Values near a Corner 


This strategy can be extended to higher order interpolation. In that case, the value on p3 has 
to be obtained by extrapolation from values on the physical boundary of the domain or must be 
given as a boundary value in the corner. Numerical examples show that, for instance in the case 
of linear interpolation in both the x- and y-directions, the use of the described modification of the 
interpolation yields a convergence rate that is smaller by a factor of » 0.5 than the convergence 
rate with the usual interpolation. 


MULTIPLICATIVE SCHWARZ ALGORITHM ON MORE THAN TWO 

RECTANGULAR GRIDS 


In general, a refinement algorithm that produces overlapping rectangular grids will produce 
more than two overlapping grids. Here we show one way to get the benefits of the multiplicative 
Schwarz algorithm in a parallel environment. The idea is an extension of the coloring idea in 
relaxation schemes. In order to define a 4-step multiplicative Schwarz algorithm we have to define 
an overlap coloring on a family of overlapping rectangular domains 

Definition 1 (overlap coloring). We call <j) : {JT} — > N an n-overlap coloring if for two domains 
0* and , with i ^ j and Pi f V / 0, 


m) ¥■ *(&) 

and 

hold. 

The following theorem is an application of the well-known four-color theorem. 
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Theorem. 1 Let (J !=1 j.JT a connected set. If 

1. \/i,je{l ... k},i ^4 j : Cl' — & is a connected set 

2. \/i,je{ 1 ... At} with ff ! fl IF ^ 0, we have ft 1 fl fF Uk{i...fc}-{i,j} Cl 1 

then there exists a 4-overlap coloring for {O' },•=!...*. 

Proof. First we construct a family of mutually disjoint open sets with 

U «‘ = U 

»'=1 i=l 

This is done by defining the recursively as 

Tu 

n*' = «*' - (j fij. 
i=i 

Now we show that f 2* is connected for all *e{l ... A:}. For ze{l, 2} this is trivial, for * > 3 we show 
this by contradiction. Suppose there is an ie{ 3...k} such that Cl 1 is not connected, then since 
Vj, A:e{l . . . A:}, j ^ A;, fF ^ Cl k and since all fl* are rectangular we can conclude that there is a pair 
of indices j , k i such that fF D Ll k Q Ct\ which is a contradiction to the second hypothesis in 
the theorem. Since the f V are not empty we can apply the four-color theorem to obtain a coloring 
of the constructed domains with four colors. Now we can use this coloring for {IT* }i— i...* and by 
construction of the f V this is a 4 coloring of the 0*. □ 


This result also holds if we replace the domains by rectangular grids. So that we can obtain 
a 4-coloring of a family of overlapping rectangular grids and since now grids of the same color do 
not overlap we can solve on those nonoverlapping grids simultaneously and process the groups of 
equally colored grids in a multiplicative manner. Given a family of overlapping grids 
and a 4-coloring for them we can define the 4-step multiplicative Schwarz algorithm in the following 
way: 

1. Initialize i = 1 . . . k. 

2. For c = 1 ... 4 

(a) *4 <— MG(L l h ,u\, fl) for all i such that Ojje^ -1 (c). 

(b) Update the boundary points of all grids that intersect with the grids of color c. 

In the context of adaptive grid refinement this algorithm yields a higher parallel potential than 
multiplicative processing of the grids. We have not investigated the numerical properties of such 
an algorithm, but the numerical results with two overlapping grids suggest that a multiplicative 
processing of the refinement grids has better smoothing and convergence properties than an ad- 
ditive algorithm. Theorem 1 provides only the existence of a 4-overlap coloring under fairly weak 
conditions in two dimensions. In three dimensions such a general result does not hold. 
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THE SCHWARZ ALGORITHM AS A SMOOTHER IN FACx 


Here we want to investigate the Schwarz algorithm as a smoother in the FAC algorithm. A 
good reference to FAC and AFAC is [8] where a detailed description of the two algorithms is 
given. In [6] an implementation of the AFAC algorithm is described that is based on regular 
block-structured grids. We extend this idea in such a way that our implementation allows for 
overlap among the grids that represent one level of refinement. As a relaxation scheme we use 
algorithms 2-m or 2-a. In table 2 we give some numerical results for the FAC algorithm on a grid 
as in figure 2. 



Figure 3: Grid Geometry and Exact Solution for Test Problem 


Table 2: Convergence Rates for Test Problem 



additive 

multiplicative 

levels 

(2,1) 

0M) 

(2,1) 

(4,1) 

2 

0.131 

0.121 

0.125 

0.119 

3 

0.131 

0.116 

0.117 

0.116 

4 

0.118 

0.119 

0.104 

0.119 


We solve Laplace’s equation (standard 5 point discretization) on the unit square with FAC using 
algorithm 2-m or 2-a as a relaxation scheme on the refinement levels. Within the relaxation 
scheme we use Gauss-Seidel relaxation. The left picture in figure 3 shows the structure of the 
refinement levels, and the right picture shows a function plot of the right hand side that we used 
in this example. The shape of the function plot makes the region of refinement that we chose seem 
reasonable. In table 2 we give the average convergence rate after 10 iterations of a FACx coarse- 
to-fine cycle using different numbers of relaxations. We observe that there is hardly a difference 
between the additive and the multiplicative Schwarz relaxation. This result was already observed 
in the comparison of algorithm 2-m and 2-a used in a multilevel scheme. We also observe that a 
small number of relaxations on each refinement level is sufficient to obtain a convergence rate that 
is comparable to the theoretical convergence rate of FAC ([9]). 
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OBJECT ORIENTED IMPLEMENTATION 


In our implementation we use C++ and the M++, or A++/P++ array classes (cf. [4], [10], 
and [11]). M++ is a commercial serial array class that provides functionality similar to Fortran 90. 
A++ is an array class developed by Daniel J. Quinlan at the Los Alamos National Laboratories. 
The user interface is compatible with M++, but in contrast to M++ the implementation focuses 
on the speed of the code. The first version of P++, a parallel array class that uses the SPMD 
programming model, was developed by Max Lemke and Daniel J. Quinlan and was based on the 
M++ class library. A new implementation of P++ based on A++ is currently being developed 
by Daniel J. Quinlan. 


Due to the object oriented features of C++ and the possibility of using A++ array statements, 
the implementation was simplified significantly. We employed these features by dividing the FACx 
code into the following parts: 

• A class that provides simple multilevel functionality. 

• A class that provides the functionality for the Schwarz algorithm (e.g. overlap information 
and transfer of internal boundaries of grids). 

• A class that provides the functionality needed for the FACx algorithm (e.g. transfer operators 
between the refinement levels). 

This division made it very simple to change the code from FACx to MLAT [2] since only changes 
in the FACx class had to be done. This illustrates the advantages of an object oriented implemen- 
tation. The code development and maintenance becomes significantly simplified. 


To test the serial version of the code we used AT&T C++ and GNU C++ on a Sun SPARC- 
station 10. Because of the compatibility of A++ and P++ we were able to produce a parallel 
version of our code for the iPSC/860 within a few days. This accomplishment shows impressively 
the advantages of the use of an array class like A++/P++. Due to the preliminary status of the 
development of A++ and P++ at the time of the implementation, we were not able to obtain 
any interesting performance results in a parallel environment. Nevertheless, we conclude that the 
approach of using object oriented programming and the use of parallel array classes can be of 
significant use and can speed up the code development process. 


CONCLUSIONS 


We showed that the Schwarz algorithm can be used as a relaxation scheme in a very efficient 
way. It has some advantages over the approach with block structured refinement grids that consist 
of nonoverlapping rectangular grids. In general, one needs a smaller number of rectangular grids 
to cover a given domain if overlap of the grids is allowed. The fact that a larger number of 
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points is needed to cover a domain if overlap is allowed may be less of a disadvantage in a parallel 
environment. The 4-step multiplicative Schwarz algorithm further increases the existing parallel 
potential of the additive Schwarz algorithm if it is used on a large number of grids. The problem 
of bad overlap geometry does not occur when FACx or AFACx is used with regular grids and 
aligned refinement grids. Bad overlap geometry can be overcome by a slight modification of the 
interpolation that is used for the transfer of the interior boundary values in the Schwarz scheme. 
Finally, we can report very positive experiences with an object oriented implementation of a 
complex code. 
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SUMMARY 


This paper develops a least-squares approach to the solution of the incompressible 
Navier-Stokes equations in primitive variables. As with our earlier work on Stokes equa- 
tions, we recast the Navier-Stokes equations as a first-order system by introducing a velocity 
flux variable and associated curl and trace equations. We show that the resulting system is 
well-posed, and that an associated least-squares principle yields optimal discretization error 
estimates in the H 1 norm in each variable (including the velocity flux) and optimal multigrid 
convergence estimates for the resulting algebraic system. 

INTRODUCTION 


In [3], Cai, Manteuffel, and McCormick developed least-squares functionals for first-order 
system formulation of the Stokes equations (generalized by a pressure-perturbed form of the 
continuity equation to allow for linear elasticity). Full ellipticity was established of the L 2 - 
based least- squares formulation in n dimensions by showing that the homogeneous form of 

This work was sponsored by the Air Force Office of Scientific Research under grant number AFOSR- 
91-0156, the National Science Foundation under grant number DMS-8704169, and the Department of Energy 
under grant number DE-FG03-93ER25165. 
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the functional is equivalent to the (i/ 1 ) n2+n+1 norm applied to the first-order system vari- 
ables (the new n 2 -component velocity flux variable, the n-component velocity variable, and 
the scalar pressure variable). This immediately yields optimal discretization error estimates 
for standard finite elements in this H 1 product norm, as well as optimal convergence esti- 
mates for multigrid methods applied to the resulting discrete systems. 

The aim of this paper is to extend this methodology to the primitive variable form of 
the incompressible Navier-Stokes equations in two and three dimensions. We do this in the 
same way that the Stokes equations were reformulated based on the velocity flux variable, 
but now we include the nonlinear convection term in the first-order system. We recast the 
Euler-Lagrange equations for the least- squares principle in the canonical form F(X,U) = 
U+T-G(\,U) = 0, where T is the least-squares solution operator for the Stokes equations. 
This allows us to apply conventional abstract theory and our Stokes results to obtain optimal 
discretization and multigrid solution estimates for each variable (including velocity flux) in 
the H 1 norm. 

These are the first H 1 product ellipticity results for the Navier-Stokes equations that ad- 
mit the practical velocity boundary conditions. Earlier work on the Stokes equations by 
Chang [5] used an acceleration variable analogous to our velocity flux; however, veloc- 
ity was eliminated from the first-order system, which seems to prevent its extension to the 
Navier-Stokes equations, and, in any case, the formulation is limited to two dimensions. In 
[2], Bochev and Gunzburger developed a least-squares approach for the velocity- vorticity- 
pressure form of the Stokes equations, but showed that it does not allow H 1 product ellip- 
ticity in the velocity boundary condition case (a mesh weighting was introduced in the func- 
tional to obtain optimal estimates). Finally, Bochev [1] extended this methodology to the 
Navier-Stokes equations, but established H 1 product ellipticity only for nonstandard bound- 
ary conditions. 

This paper is organized as follows: in the next section, we introduce the Navier-Stokes 
equations and their first-order form; in Section 3, we develop the associated least-squares 
principle; in Section 4, we recast this principle in canonical form and apply a corresponding 
abstract theory to derive error estimates; in Section 5, we establish well-posedness of the 
least-squares canonical form based on regularity assumptions for the original Navier-Stokes 
equations; and, in the final section, we develop a simple but optimal multigrid solver for 
the resulting discrete system. Throughout the paper we use bold face to denote vectors and 
underlined bold face style to denote matrices. 

VELOCITY-PRESSURE- VELOCITY-FLUX NAVIER-STOKES EQUATIONS 

The dimensionless equations governing the steady incompressible flow of a viscous fluid 
in bounded domain ficE l ,n = 2,3, may be written in the form 

— i'Au + (Vu*) 4 u + Vp = f in Cl (1) 

V 4 u = 0 in 17, (2) 

where u, p, and f denote velocity, pressure, and given body force, respectively, and v is the 
inverse of the Reynolds number, A. The velocity variable u is a column vector with scalar 
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components so that Vu* is a matrix with columns Vu,-. Together with equations (l)-(2), 
we consider the velocity boundary condition 

u = 0 on T , (3) 

where F is the boundary of ft. For uniqueness, we also impose the baseline pressure condi- 
tion 

/ pdfl = 0. (4) 

Jo 

To formulate the least-squares method, equations (l)-(2) will be transformed into an equiv- 
alent first-order system. The first step in this process is to introduce the velocity flux variable 

U = Vu 4 , (5) 

which is a matrix with entries Uij = duj/dxi , 1 < i,j <n. Then 

(V*U)* = Au 

and it is easy to see that the new variable satisfies the identities 

trU = 0, V x U = 0 in 0 


and 


n x U = 0 on T , 


( 6 ) 


where trU = Uu and n is the outward unit normal on F. Furthermore, the nonlinear 
term in (1) takes the particularly simple form 


(Vu^u = U 4 u . 


As a result, (l)-(2) can be replaced by the first-order system 


-z/(V 4 U) < + U t u + Vp = f 
V*u = 0 
U-Vu* = 0 
V(frU) = 0 
V x U = 0 


in 0 

(7) 

in ft 

(8) 

in ft 

(9) 

in ft 

(10) 

in ft 

(11) 


along with conditions (3), (4), and (6). 


The second step in the formulation of a suitable first-order system is to scale the mo- 
mentum equation by the Reynolds number and replace the data f by functions with known 
boundary values. The resulting form of the equations will provide insight into the overall 
approach and facilitate error analysis of the corresponding least- squares method. For this 
purpose, we assume that f € L 2 (ft)” and consider the unique solution (u 0 ,po) of the scaled 
Stokes problem 


— Au + Vp = ^f 

in ft 

V'u = 6 

in ft 

u = 0 

on r 

fa pdti = 0. 



( 12 ) 
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Equation (7) is then replaced by 


-( V‘U)< + i(U + U 0 ) < (u + u 0 ) + Vp = 0 in SI , (13) 

which determines the perturbation (U, u, vp) from the Stokes solution (Vuq, Uq, vp Q ). To 
summarize, our reformulation yields the system 

— (V^U)* H — (U + U 0 ) < (u-hu 0 ) + Vp = 0 in (14) 

V*u = 0 inf) (15) 

U-Vu‘ = 0 infi (16) 

V(irU) = 0 in ft (17) 

V x U = 0 inf). (18) 


LEAST-SQUARES METHOD 

The least-squares functional for first-order system (14)-(18), (3), (4), and (6) is defined 
as follows: 

J(H,u,p) = ||-(V , U)' + i(U + U 0 )'(u + Uo) + Vp||^ 

+ ||Vu||5 + ||U-Vu‘||5 + ||V((rU)K + ||VxU||S. (19) 

Note that our scaling of (7) by the Reynolds number is equivalent to the use of an L 2 norm 
weighted by A for the residual of this equation; see also [3]. 

To define the least-squares method, we need a suitable minimization problem. Let 
X = {(U, u, p ) e H x (nr 2 x H\to) n x H 1 (0) n Ljftn) |u = 0,nxU = 0onr}, (20) 

where Ll(il) = {p € L 2 (Q,)\ f n pdQ, = 0}. Then the least-squares principle for functional 
(19) is 

Find (U, u,p) € X such that 

J (U, u, p) < J (V, v, q) for all (V, v, q) e X. (21) 

It is easy to see that the Euler-Lagrange equation for this minimization problem is given 
by the variational problem 

Find (U, u,p) € X such that 
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£((U,u,p),(V,v,p)) = 

(-(V*U ) 4 + i(U + Uo)*(u + Uo) + Vp, 

_(V*V)‘ + i ((U + Uo)V + V*(u + uo)) + V 9 ) o + 

(V 4 u, V f v) o + (V(<rH), V(frV)) 0 + 

(U - Vu 4 , V - Vv‘) o 4- (V x U, V x V) 0 = 0 (22) 

for all (V, v, q ) G X. 

Let X/i denote a finite-dimensional subspace of X. Then the least-squares discretiza- 
tion method for the Navier-Stokes equations is defined by the following discrete variational 
problem: 

Find (U^, u h ,p k ) E X h such that 

B(( U\ u h ,p h ), ( V h , v h ,p h )) = 0 for all (V\ v\ q h ) G X fc . (23) 

It is easy to see that the discrete variational problem (23) corresponds to the necessary con- 
dition for the following discrete least- squares principle for (19): 

Find (JJ\ u\ P h ) G X fc such that 

J(U\ u h ,p h ) < J(V h , v\ q h ) for all (V\ v\ q h ) G X h . (24) 

For the space Xh, we assume the following approximation property: there exists an integer 
d > 0 such that, for all U G H d+1 ( fi)" 2 , u G H d+1 ( fl) n , and p G H d+1 (tt), one can find 
(U h ,u h ,p h ) e Xh with 

fin- IZ*||r + l|u - u'-iu + ||p - /II, < ch M ~ r (||U|| J+1 + ||u|| J+1 + || P || J+I ) , (25) 
r = 0, 1. 


DISCRETIZATION ERROR ESTIMATES 


The main goal of this section is to derive error estimates for least-squares method (23). 
For this purpose, we show how to cast nonlinear problems (22) and (23) in the respective 
canonical forms 


F(\,U) =U + T-G{\U) = 0 


(26) 


and 

F h {X,U h )=U h + T h -G(\,U h ) = 0. (27) 
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This will allow us to apply the abstract approximation theory of [6]. The following function 
spaces will be needed in the sequel (with some nonnegative integer m): 

X m = [tf m+1 (fi) n2 x H m+1 (n) n x tf m+1 (fl)] nx, (28) 

Y = X* , (29) 

Z = L 3 / 2 ( fi)" 2 x I?! 2 {n) n x L 3/2 {Sl) ; (30) 

where X* denotes the dual of X with respect to the L 2 inner product. The approximation 
in (27) is introduced by way of the operator 7\. Therefore, the error estimates will depend 
largely on the nature of the operator T and its approximation Th. The basic idea is to define T 
to be the least-squares Stokes solution operator and T h to be its finite element approximation. 
The approximation properties of these choices have been studied in [3]. Now, once T is 
known, the operator G is then defined by the remaining terms in (22). The key is that the 
corresponding nonlinear part for T h is also G, as we assert in our first lemma. 

With this in mind, we make the identifications U = (U, u, p), U h = (U fc , u h ,p h ), V = 
(¥,v,g), V h = (V fc , v h ,q h ), and A = l/v, and we assume that A e A, where A is a 
compact subset of R + . We then introduce the following: 

T : Yi->X defined by U = Tg for g e Y if and only if 

Bs(U,V) = (-(V‘U )‘ + Vp,-(V'Y)‘ + V,) o 
+ (V'u,Vv) o + (V(frII),V(!rY)) 0 
+ (U-Vu‘,V-Vv') o +(VxIJ,VxV) 0 

= (giiY) + (g,,v) + (gj,«) (31) 

for all (V, v,g) e X; 

Th : Y Xh defined by U h = Tg for g € Y if and only if 

B s (U h ,V h ) = (gi, V fe ) + (g 2 , v h ) + (g 3 ,«*) for all (V h , v\q h ) € X, ; (32) 

and 

G : A x X — ► Y with g = G( \,U) for U 6 X if and only if 

(gi,V) + (g 2 ,v) + (g 3 ,g) = 

(-(V 1 !!) 1 + Vp, ~ ((U + Uo)V + V 4 (u + uo))) + 

(^(U + Uo^u + uo), 

-(V'V) 4 + Vq + i ((U + Uo)‘v + V*(u + uo))) (33) 
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for all (V, v, q) E X. 


We then have the following equivalence. 

Lemma 1. Assume that T, Th, and G are defined by (31), (32), and (33), respectively. 
Then nonlinear problem (22) is equivalent to (26) and discrete nonlinear problem (23) is 
equivalent to (27). 

Proof. Assume thatZV = (U, u, p) solves problem (26) with T and G given by (31) and 
(33), respectively. Then U — —Tg if and only if 

B S (U,V) = ( g,V) for all VEX, 

and g = G(A, U) if and only if (33) holds. It follows that U also solves variational problem 

(22) . Conversely, if U solves (22), let g be defined by (33). Then Bs(M, V) = (g, V) for 
all V € X, i.e., U = —Tg. Thus, (22) and (26) are equivalent. Proof of the equivalence of 

(23) and (27) is identical. □ 

The error estimates for the least-squares method (23) can now be derived from the ab- 
stract approximation theory of [6]. Below we state the main result of this theory in terms 
of general T and Th but specialized to our needs. Here we let DuG( X,U) and DuF(X,U) 
denote the Frechet derivative of G and F with respect to U. 

Theorem 1. Assume that {(X,U(X)) | A £ A} is a branch of regular solutions of (26), 
i.e., that A i— > U(X) is a continuous map A w X and that D u F(X,U) is an isomorphism 
ofX, where F(X,U) = 0 is abstract form (26). Furthermore, assume that T E L( Y, X) 
and that G is a C 2 map A x X h Y, such that all second derivatives of G are bounded on 
bounded subsets of A x X. Finally, assume that there exists a space Z C Y, with continuous 
imbedding, such that D u G(X,U) E L(X,Z) for all X E AandU E X. If approximate 
problem (27) is such that 

lim||(r-r,)g||x = 0 (34) 

for all g 6 Y and 

r *IU(z,x) = 0, (35) 

then: 

1 . there exists a neighborhood O of the origin in X and, for h sufficiently small, a unique 

C 2 function A U h ( A) € Xh such that {(X,U h (X)) | A E A} is a branch of regular 

solutions of the discrete problem (27) and U( A) - U h ( A) E O for all A E A; 

2. there exists a positive constant C, independent of h and X, such that 

ll«‘(A) - «Wllx < C\\{T - T‘)G(A,«(A))|| X (36) 

uniformly in X; 
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3. if the regular branch is such that U(\) £ X 771 for some integer m > 1 and d = 
min{d, m}, where d is the integer from (25), then 

||U(A) + ||u(A) - u‘(A)||, + ||p(A) -/(AJU, 

< Ch 1 (||U(A)|| J+1 + ||u(A)|| i+1 + ||p(A)||j +I ) . (37) 

In the next few lemmas, we verify the hypotheses of Theorem 1 for our least-squares 
formulation. We begin by establishing essential properties of the operators T and Th, which 
we henceforth assume are defined by (31) and (32), respectively. 

Lemma 2. T € L( Y, X) and T h £ L( Y, X*). 

Proof. The form #$(-, •) is continuous and coercive on X x X (see [3]) and, by virtue 
of the inclusion c X, it is also continuous and coercive on X^ xX),, Furthermore, 
(g, V) defines a continuous functional on X 9 V h Rfor each g £ Y. Thus, the Lax- 
Milgram Theorem implies that, for all g £ Y, variational problems (31) and (32) have 
unique respective solutions U 6 X and U h 6 X^, i.e., T : YhX and Th : Y h X& are 
well defined linear operators. From 

C||W|| 2 x < B S (U,U) = (t,U) < l|g||vl|M||x , 

it follows that 

||rg|| x = IMx<c||g|| Y , 

i.e., T is in L( Y, X). The proof that T h £ L( Y, X^) is similar. □ 

Before continuing with the approximation properties of Th, consider the choice of Y and 
Z in (29) and (30). When Z C Y with compact imbedding, the proof of (35) can be sim- 
plified. First, note that Y is not identical to a product of H~ x (D) spaces. For instance, with 
U,- denoting the ith column of U, then U t - £ Hj(fi) = {v £ |n x v = 0 on T } , 

whose dual is not H~ 1 ( fl) n . As a result, Z will be compactly imbedded in Y if L 3 / 2 (0) 
is compactly imbedded in the duals of Hq(SI), Hj(f2), and /f 1 (0). The first imbedding 
follows from Sobolev’s Imbedding Theorem; see, e.g., [6]. Compactness of the other two 
imbeddings can be shown along the following lines. Since components of Hj(O) and the 
space JT 1 (0) are compactly imbedded in L 3 (0) and the adjoint of a compact operator is 
compact, it follows that L 3 / 2 (fi) n and L 3 / 2 (ft) are imbedded compactly in the dual spaces 
of Hj(Q) and if 1 (ft). 

Lemma 3. Convergence properties (34) and (35) hold. If, in addition, g £ Y is such 
thatTg £ X.™ for some m > 1 and d = min(d, m), where d is the integer from (25), then 

ll(r-n)g||x<c/>' i || 7 , g|lx«. ■ 08) 


Proof. First note that (35) follows from (34) when the imbedding Z c Y is compact. 
It thus suffices to establish (34); that is, 

II (r - n)g||x = ||U - U*||i + llu - 11*11, + ||p - p‘11, o 
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when h -*■ 0. Recall that T : YwX. Therefore, from g G Y it follows that MeX; that 
is, U € H 1 { 0) n \ u G H l (Q,) n , and p G #*(0). Then the above limit follows from Cea’s 
Lemma and the standard approximation result for v G H 1 (fl): 

liminf llu — v^lli = 0 . 

h-* 0 v h 

(See [4] for an analogous result for scalar elliptic equations.) 

To prove the second part of the lemma, suppose U = Tg G X TO . Then an immediate 
consequence of the continuity and coercivity of Bs{-, •) is the Stokes error estimate 

ll(T-Ti)g||x = l|U-U*|| 1 +||u-u'‘|| 1 +||p— /111 < C/. J (||U|| < - +1 + ||u|| J+1 + ||p|| J+1 ) . 
□ 


The only hypotheses of Theorem 1 that remain to be verified are the assumptions con- 
cerning the nonlinear operator G. For this purpose, we need the weak and strong forms of 
the first Fr&het derivative D u G{X,U) and the weak form of the second Frechet derivative 
DyG(\,U). To determine the weak form of DuG(X,U), let WGX, substitute U + U into 
(33), and expand about U. This yields the following weak representation of D u G(\,U): 

D u G{X,U) :AxX->Y defined by g = D u G{X,U)U for U G X if and 
only if 


(gl,V) + (g2,v) + (g 3 ,g) = 

(-(V*!!)* + Vp, i (u*v + V*u)) o + 

(-(V*!!) 1 + Vp, i ((U + Uo)V + V‘(u + u 0 ))) o + 

(^(U + U 0 ) t (u + u 0 ), i (jZv + V i u)) Q + 

((U + Ho^u+n^u + uo)) , 

_(V*V) 4 + Vq + i ((U + Uo)*v + V*(u + Uo))) (39) 

for all (V, v, q ) G X. 

The strong form of D u G(X,U)U can be found from (39) using standard integration by 
parts: 

gl = ^(-(V^ + Vp+^U + Uo^u + uo)) 

+ I(u + uo) (-(V'U)* + Vp + i ((U + Uo)*u + U*(u + u 0 ))) * 

+ iv((U + U 0 )‘u + u‘(u-f uo))*, (40) 
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( 41 ) 


g 2 = \tL (-( V‘U)‘ + Vp + i(U + U 0 ) < (u + u 0 )) 

+ ^(U + Uo) (-(V*!!)* + Vp + ^((U + Uo)‘u + U*(u + u 0 ))) , 

g3 = -iy* ((U + Uo)‘u + U\u + uo)) , (42) 

for all (V, v, q) € X. 

Finally, the weak form of the second Fr6chet derivative is 

DlG(X,U) :Ax[XxX]-»Y defined by g = D 2 u G{\UWM for U € X 
if and only if 

(gi,H) + (g2,v) + (g 3 ,?) = 

(— (V 4 U) 4 + Vp + i (u(u + Uo) + (U + Uo)^) , 

-(uV + V'u)) + 

V / 0 

- f-(V 4 U)‘ + Vp + u‘ (u + Uo) + (U + Uo) 4 u, -(U V + Y*fi)) + 

V \ V / o 

-(V 4 V) 4 + V 9 + i((U + Ho)‘u + Y*(u + u 0 ))) o (43) 

for all (V, v,q) G X. 

The next lemma summarizes the technical results that we use in the sequel. 

Lemma 4. Let Di denote the derivative with respect to the i th coordinate variable in K\ 
1 < * < n, and assume that u, v, w, and z are in if 1 (12). Then 

f DiUvwdCl < C7||u||i ||u||i |Mli , (44) 

J Q 

\ <i <n, and 

[ uvw zdQ < IMIi IMIi INK • (45) 

Jn 

The mapping (u,v) uv is a continuous bilinear mapping from L 2 (Sl) x if 1 (12) into 
L 3 / 2 (12) and the mapping (u,v,w) uvw is a continuous trilinear mapping from if 1 (12) x 

if 1 (12) x if 1 (12) into L 3 / 2 (l 2). That is, 

||uu||o, 3/2 < C'||«||o, 2 ||u||i ,2 for all u 6 L 2 (12) and v E H 1 ^ 2) , (46) 
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(47) 


||uvH|o, 3/2 - ^ , ll M lli. 2 ll u lli. 2 |l u, lli ,2 for all u,v,w € H'iQ,) . 

Proof. The first part of the lemma follows easily from the imbedding C L 4 (0) 

in two and three dimensions and the Holder inequality. The second part follows directly 
from a result in [6]. □ 

For a more general version of (46) and (47), see [7]. 

In the next lemma, we establish properties of G that are required for the validity of the 
approximation result in Theorem 1. 

Lemma 5. Assume that the mapping G is defined by (33). For X, Y, and Z given by 
(20), (29), and (30), respectively, the following are true: 

1. For all U € X, D u G(X,U) € L(X, Z). 

2. The second Frechet derivative D^G(X,U) is bounded on bounded subsets of Ax X. 

Proof. To prove 1, consider strong form (40)-(42) of DuG(\,U). By assumption, U € 
X; that is, U € /^(H)" 2 , u e H x ( fl) n , andp e H 1 ( Cl). Now each of the equations (40), 
(41), and (42) consists of terms of the form v and uvw, where u, v, and w belong to 
H *(0), so the second part of Lemma 4 implies that (g x , g 2 , g 3 ) G Z. Using (46) and (47), 
it also follows that 

\\D u G(XMMz<C\m x , 
i.e., that D u G{X,U) € L(X, Z). 

To prove 2, consider weak form (43) of the second Frechet derivative. Assume that 

(A ,U) belongs to a bounded subset of A x X and let U € X, and U € X be arbitrary. 
Then it is not difficult to see that weak form (43) involves only terms of the form Diuvw 
and uvwz, where u, v, w, and z belong to H 1 ^). Thus, each term can be estimated using 
(44) or (45): 

|(gi, V)| < C X (UM«, A)(||W||x + ||£||x)||Y||i ; 

I(g 2 ,u)| < C 2 (U,U 0 ,X)(\\U\\ x + ||W||x)||u|| 1 ; 

I(g3,?)l < C 3 (M,« 0 , A)(||M||x + ||W||x)|| 9 ||, ; 

where Ci is polynomial function of ||ZY||x, ||Wo||x» and the parameter A. It then follows that 
DifG( X,U) is bounded in the norm of L(X, L(X, Y)) on all bounded subsets of A x X. □ 


This completes verification of all assumptions of Theorem 1. As a result, we can con- 
clude that error estimates (36) and (37) hold for the least-squares finite element approxima- 
tion as long as problem (22) has a regular branch of solutions with sufficient regularity. 
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WELL-POSEDNESS OF THE LEAST-SQUARES FORM 

In this section, we address the question of the well-posedness of least- squares formula- 
tion (22). More precisely, our aim is to show that if{(A,(u(A),p(A)))|A £ A} is a branch 
of regular solutions of original velocity-pressure Navier-Stokes problem (l)-(4), then 

{(A, (U(A), u(A),p(A))) | A £ A} 

is a regular branch for variational problem (22). This is an important question not only be- 
cause application of Theorem 1 requires a regular branch, but also because it would assert 
that the least-squares formulation does not introduce bifurcation phenomena that are not al- 
ready present in the original equations. The question is also nontrivial since the equivalent 
strong form of (22) now involves derivatives of nonlinear equations (l)-(2). 

Assume that (u(A),p(A)) £ x ^o(^) yields a regular branch of solutions of 

(l)-(4), i.e., for every A £ A the pair (u(A),p(A)) is a nonsingular (weak) solution of the 
Navier-Stokes equations. We recall the result of [6] that (u,p) is a nonsingular solution if 
and only if the linearized problem 

— j/Au + (Vu'^u + (Vu'fu + Vp = f* inn 

V‘u = 0 in SI 
u = 0 on T 

f pdCl = 0 

Jq 

has a unique (weak) solution (u,p) £ xLl(Q.) for each f* £ H~ 1 (Q) n . Specialized 

to our needs, the nonsingularity assumption asserts that the problem 

-i/Au + (Vu')'(u + u 0 ) + (V(u‘ + u ;))'u + Vp = f* inn (48) 

V‘u = 0 in n (49) 

u = 0 on T (50) 

[ pdSl = 0 (51) 

Jq 

has a unique (weak) solution (u,p) £ Ho(Cl) n x Ll(fl) for each f* £ /f -1 (0) n , where 
(u 0 ,po) solves Stokes problem (12) with the original data f. 

Under this assumption, well-posedness of (22) will follow if we can establish that U ( A ) = 
(H(^)>u(A),p(A)) withU(A) = Vu( A) 4 is a nonsingular solution of (22) for all A £ A. In 
terms of canonical representation (26), this amounts to showing that the linearized mapping 
DuF(\,U) is an isomorphism of X; that is, the linearized equation 

D u F{\U)U = (/ + T- D u G{\,U))U = V (52) 

has a unique solution U £ X for all V £ X. 

Compactness of T : Z i-> X follows from (35), which asserts that it is a uniform limit 
of compact operators 7*. Now, from Lemma 5, we have DuG(\,U) £ L(X, Z), so the 
operator T ■ DuG(\,U) : X ^ X is compact. Thus, the Fredholm alternative can be 
applied to (52), and we can assert that DuF(\,U) is indeed an isomorphism of X if and 
only if the homogeneous equation 
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D u F(\,U)ti = (I + T-D u G(\,U))U = 0 (53) 

has only the trivial solution U = 0 in X. This fact is established in the next lemma using 
our nonsingularity assumption on (u(A),p(A)). 

Lemma 6. Assume that (u, p) is such that linearized equations (48)-(51) have a unique 
solution for each f* G H~ 1 (Q,) n . Then homogeneous problem (53) has only the trivial so- 
lution. 

Proof. Using definitions (31) and (33), one can easily verify that (53) is equivalent to 
the variational problem 

Find (LJ, u, p) € X such that 
B((U,u,p),(Y,v,p)) = 

(-( V'U) 4 + i((U + Uo) 4 u + jj\u + u 0 )) + 57 p, 

-(^vy + ^ ((u + Uo)V + v 4 (u + uo)) + v 9 ) o + 

(V‘u,V < v) o +(V(frU),V(trV)) o 

(U - Vu‘, V - Vv‘) + (V x U, V x V) Q = 0 (54) 

for all (V, v, q) G X. 

Variational problem (54) is evidently the Euler-Lagrange equation for the minimization 
problem 

Find (LJ, u, p) G X such that 

J,(U, u, p) < J,(V, v, q) for all (V, v, q) G X, (55) 

where 

J,(U,u,p) = || — (V 4 !!)* + ~((H + Ho)*u + U^u + u 0 )) + Vp||o 
+ living + iiu - vu^ig 

+ ||V(frU)||o + || V x U||o (56) 

Thus, nonsingularity of (U, u, p) would follow if we could show that (55) has no nontrivial 
minimizers. Assume the contrary. Then the nontrivial minimizer (LJ, u, p) satisfies 

— (V i U) < + i((U + U 0 ) < u + U < (u + u 0 )) + Vp = 0 (57) 

U-Vu‘ = 0 (58) 

V‘u = 0. (59) 

Then from equations (57) and (58) and identities LJ = Vu 4 and Uq = Vu^, we conclude 
that the pair (u,p) satisfies 
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-i/Au + (V(u* + Uq))‘u + (Vu*) 4 (u + u 0 ) + Vp = 0 . 

Now the premise that (U, u,p) is nontrivial, together with (58), implies that (u,p) is non- 
trivial. Since (59) is also satisfied, then (u, p) is also a nontrivial solution of (48)-(51), which 
is a contradiction. □ 


MULTIGRID SOLVER FOR THE DISCRETE SYSTEM 


Here we consider a simple iterative method applied to (27) and show that it converges 
linearly with bound uniform in h and A. Our approach rests on using a multigrid precondi- 
tioner for Th and observing that the operator in (27) is well-conditioned uniformly in h and 
A. The development is greatly simplified by basing the analysis on the inner product Bs(- , • ) 
defined in (3 1) and by choosing elements of the multigrid-based algorithm that are very easy 
to analyze. (Most assumptions are made only for convenience; more general conditions can 
be handled with more cumbersome but straightforward arguments. However, allowing for 
the more effective direct treatment of the nonlinearity within the multigrid process would 
require much more sophisticated analysis tools than we use here.) 

Let Mh be defined so that U h = Mh g represents one or more cycles of (additive or 
multiplicative) multigrid applied to problem (32), starting from the initial guess U h = 0. 
For simplicity, assume that Mh is symmetric in the Bs{-, •) inner product (e.g., Mh may 
consist of one relaxation of point Gauss-Seidel with a given ordering before coarsening and 
one relaxation with the reverse ordering afterwards). Again for simplicity, assume that Mh 
is so effective that 


8B s {T h V\ V h ) < B s {M h V\ V h ) < B s (T h V h , V h ) (60) 

for all V h G ~Xh and for some positive constant 8 independent of h and A. The upper bound 
can be assured simply by dividing the usual multigrid cycle by 2 , and the lower bound fol- 
lows from the product H 1 equivalence of Bs{-, •) established in [3]. Assume that 

{(A,W(A))|Ae A} 

is a branch.of regular solutions of (26), and let F h (\,U h ) = 0 denote canonical form (27). 
Then it is easy to see that there exists a neighborhood O of the origin in X and positive 
constants 7 and p, independent of h and A, such that 

7 £ s (V\ V*) < Bs(D u F h (\,U)V h , V h ) < P B s (V h , V k ) (61) 

for all V h G X/i, where (A, U) is any element of A x X^ for which U( A) —UeO. The 
lower bound follows from our regular branch assumption, and the upper bound follows from 
Lemma 2 and property 1 of Lemma 5. 

The iterative method that we consider for solving (27) is given by the expression 

U h <- U h - sM h VJ{U h ), (62) 

where J(U h ) is the functional in (19) and s = Suppose for the moment that Mh = Th. 
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Then the proof of local linear convergence of (62) in the Bs{-, ■) norm with linear factor 
bounded by yj 1 — ^ would follow from: linearizing VJ(U h ) about the solution of (27); the 

relation Th\7J{U h ) = F h { \,U h ); and the symmetry of D u F h (\,U) in the Bs{-, •) inner 
product. For (62) with general Mk, we can th en use ( 60) to prove local linear convergence 
in the Bs(-, •) norm with factor bounded by \jl — 

This establishes optimality of our simple iterative method based on a multigrid Stokes 
preconditioner. It is straightforward to extend this result to a full-multigrid-like scheme, 
where an approximation to the solution of the Navier-Stokes equations is achieved with ac- 
curacy up to discretization error at the cost of a few fine grid operator evaluations. 
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SUMMARY 

MGLab is a set of Matlab functions that defines an interactive environment for 
experimenting with multigrid algorithms. The package solves two-dimensional ellip- 
tic partial differential equations discretized using either finite differences or finite vol- 
umes, depending on the problem. Built-in problems include the Poisson equation, the 
Helmholtz equation, a convection- diffusion problem, and a discontinuous coefficient 
problem. A number of parameters controlling the multigrid V-cycle can be set us- 
ing a point-and-click mechanism. The menu-based user interface also allows a choice 
of several Krylov subspace methods, including CG, GMRES(k), and Bi-CGSTAB, 
which can be used either as stand-alone solvers or as multigrid acceleration schemes. 
The package exploits Matlab’s visualization and sparse matrix features and has been 
structured to be easily extensible. 

WHAT IS MGLab? 

MGLab is an interactive environment based on Matlab Version 4.0 for solving 
elliptic partial differential equations using multigrid algorithms. A graphical user 
interface (GUI) enables the user to select a problem, set parameters for the multigrid 
V-cycle, optionally choose a Krylov subspace accelerator, and visualize the results. 
MGLab is written in Matlab [1], which has greatly simplified the programming but 
has led to some loss of efficiency in a few respects. 

A number of very good introductions to multigrid methods are available that can 
be used in conjunction with MGLab, including refs. [2]— [4] . The numerical treat- 
ment of elliptic partial differential equations is discussed in ref. [5], and the finite 
volume method for discretizing elliptic problems is described in ref. [6]. Some of 
the experiments described in ref. [7] have been included in MGLab as demos. Some 
software that addresses similar issues is described in refs. [8]— [14] . The basic linear 
algebra concepts needed for a number of the components of MGLab, including the 
iterative solvers, are discussed in refs. [15]— [23] . Other references that may be useful 
background reading for MGLab users include [24]— [34] . 
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THE GRAPHICAL USER INTERFACE 


The interface between MGLab and the user is a menu structure; the menu items 
can be selected using a point-and-click mechanism. Menu items are grouped according 
to their function, depending on whether they relate to the partial differential equation, 
the solver, multigrid parameters, visualization of results, or built-in demos. Top level 
menu choices and their submenus are outlined below. 


MGLab 

Run 

Show Params 
Version Info 
Reset 
Restart 


The submenus in the MGLab top-level menu item control the 
basic behavior of the package, including solving the currently 
selected problem, displaying the currently selected parameters, 
and restarting MGLab with the default parameters. 


Quit 



Problem 



Poisson 

Helmholtz 


t> 

Convection- Diffusion 

t> 

Cut- Square 


t> 

Problem Size 


t> 


The submenus in the Problem top-level menu item 
select which partial differential equation to solve; 
further submenus are available for setting problem- 
dependent parameters. The problem size can also be 
set here. 


Solver 

V-Cycle 


CG 


Bi-CGSTAB 


CGS 


GMRES(k) 

> 

SOR 

t> 

Full-Multigrid 

Preconditioner 

> 

Stopping Criteria 

> 


The submenus in the Solver top-level menu item are 
used to select the solver, choose a preconditioner if de- 
sired, and set the stopping criteria. The GMRES menu 
item has a submenu for choosing the GMRES restart 
parameter, and the SOR menu item has a submenu for 
choosing the SOR relaxation parameter. 
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MG-Parameters 


Number of Levels 

> 

Smoother 

t> 

Restriction 

t> 

Prolongation 

t> 

Coarse-grid Solver 

t> 

Coarse-grid Operator t> 

MG Cycle 

t> 


The submenus in the MG-Parameters top-level 
menu item are used to set various multigrid pa- 
rameters, including the number of grid levels, the 
smoother, the restriction and prolongation opera- 
tors, the solver for the coarse grid problem, the 
method for generating the operators on the coarser 
grids, and the type of multigrid cycle, such as the 
V- cycle or the W- cycle. 


Visualize 

Convergence History 
Computed Solution (surf) 
Computed Solution (pcolor) 
X-Axis > 

Y-Axis > 


The submenus in the Visualize top-level menu 
item are used to view the results after solving a 
problem. The convergence history can be plot- 
ted, the scaling along the x and y axes for the 
convergence history plot can be chosen, and the 
numerical solution can be displayed either as a 
surface plot or a contour plot. 


Demos 

Smoothers 
Fourier analysis 
Truncation error 


The submenus in the Demos top-level menu item select and 
run demonstrations that illustrate specific properties of multi- 
grid methods, such as the behavior of different smoothers, 
how the errors after the coarse grid correction and after the 
post-smoothings in the V-cycle behave in physical and Fourier 
space, and how the truncation error compares with the dis- 
crete residual. 


ELLIPTIC PROBLEMS 


The built-in test problems in the current version of MGLab are restricted to two- 
dimensional elliptic partial differential equations on rectangular domains. 

V • (aVu) + b ■ Vu + cu = f in D, (1) 

u = g on di 1. 

The domain D is the unit square {(a;,y) : 0 < x,y < 1}, and the elliptic problem 
is discretized using the standard 5-point stencil on a uniform mesh. Currently the 
test problems all have zero Dirichlet boundary conditions. The matrices are stored 
using Matlab’s sparse storage format, which ensures that matrix-vector products are 
efficient. Furthermore, the coarse grid problem can be solved using Matlab’s built-in 
sparse direct solver [35], which uses graph-theoretic techniques to reorder the rows 
and columns of the matrix to reduce fill-in during the elimination process. 
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Even within the discretization and boundary condition restrictions, a number of 
different types of elliptic problems are possible. MGLab’s test suite includes the 
Poisson equation, the Helmholtz equation, a convection- diffusion equation, and a 
discontinuous coefficient problem (“cut-square”). 

Poisson Equation. The Poisson equation, — V 2 u = /, is the easiest problem in 
MGLab to solve; the coefficient matrix of the discretized equation is both symmetric 
and positive definite. 

Helmholtz Equation. The Helmholtz equation, — V 2 u + ku = /, is the same as the 
Poisson equation, except for the ku term. Depending on k, this term can make the 
problem indefinite or complex. The parameter k can be selected by the user, where 

fc G {-10, -5, -1,0, 1,5, 10,10 + *}- 

Convection-Diffusion Equation. The convection-diffusion equation, — V 2 u + 

A u x + au = /, adds the convection term \u x to the Helmholtz equation. This added 
term can make the coefficient matrix of the discretized problem nonsymmetric. The 
parameters A and a can be selected by the user, where A £ {0, 10, 100, 1000} and 
<7 £ {-100,-50,0,5,10, 

20,50,100}. 

Cut-Square Equation. The cut-square equation is —V • (aVu) = /, where a is 
a discontinuous function of x and y. Specifically, a(x,y) = a. for 0.4 < x,y < 0.6, 
and a(x, y) = 1 elsewhere in 0. The parameter a can be selected by the user, where 
a £ {0.001, 0.01, 0.1, 1, 10, 100, 1000}. 

MULTIGRID PARAMETERS 

MGLab is designed to solve elliptic partial differential equations using multigrid 
methods, with the option to embed the multigrid solver as a preconditioner in a 
Krylov subspace method. A number of parameters that determine the V-cycle can 
be set through the graphical user interface. These include the number of levels, 
the smoother, the number of pre- and post-smoothing sweeps, the restriction and 
prolongation operators, the coarse grid solver, and the type of multigrid cycle. 

Number of Levels. The nu m ber of grid levels can be chosen to be between 1 and 5. 
Note that levels = 1 corresponds to a sparse direct solver. If the chosen number of 
levels is too large for the current problem size, it is set to the largest number possible. 

Smoothers. The available smoothers are weighted Jacobi, Gauss-Seidel, and Red/Black 
Gauss-Seidel. For the Jacobi smoother, the user can pick the weighting factor. 
The number of pre- and post-smoothing sweeps (z+jZ+j can also be set through the 
Smoother submenu. 

Restriction Operators. The restriction operators available in MGLab are injec- 
tion, half-weighting, and full-weighting. These are implemented with fairly compact 
code that uses Matlab’s colon notation for accessing arrays. 
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Prolongation Operators. MGLab offers bilinear and cubic interpolation as the 
choices for prolongation. These are currently implemented by calls to Matlab’s 
interp2 function. 

Coarse Grid Solver. The default coarse grid solver is Matlab’s built-in sparse 
direct solver [35]. As an alternative, the user can choose to use the smoother as the 
coarse grid solver. This is less costly but also less accurate. 

Multigrid Cycle. Although the V-cycle is the default multigrid cycle, the user can 
also select the W-cycle. Other cycles, such as the half V-cycle or weighted V-cycle, 
could be added easily. Full multigrid can be selected in the Solver menu. 

KRYLOV SUBSPACE ACCELERATORS 

The V-cycle defined through the multigrid parameters discussed previously can 
be used as an iterative solver on its own or as a preconditioner for Krylov subspace 
methods, such as CG, GMRES(£), and Bi-CGSTAB. For solving the linear system of 
equations Ax = b, these methods work with a sequence of Krylov subspaces defined 
by 


■Kj(r 0 ,A) = span{r 0 ,Ar 0 , ...,A 3 Vo}. (2) 

The j?-th iterate Xj is picked from 

Xj £i 0 + Kj(r 0 ,A), 
where r 0 is the initial residual b — Ax o. 

Below we list some of the important properties of the methods; details of the 
algorithms can be found in the references given with each method. See also refs. [19] 
and [22]. 

CG. The Conjugate Gradient method is a Krylov subspace accelerator for symmetric 
positive definite (SPD) systems; this method minimizes the A- norm of the error 
at each iteration. The preconditioned version (PCG) requires a symmetric positive 
definite preconditioner. The CG method was developed by Hestenes and Stiefel [16] 
and is discussed in ref. [15]. 

GMRES(fc). The Generalized Minimum Residual method of Saad and Schultz 
[18] is a direct generalization of the CG method to matrices that are not SPD. The 
CG method takes advantage of a three-term recurrence relation that is not available 
in GMRES, so both the number of vectors that must be stored and the number of 
floating point operations performed increase with each iteration. For this reason, 
GMRES is typically restarted every k iterations. 

CGS. Conjugate Gradient Squared is a variant of the Bi-Conjugate Gradient (Bi- 
CG) method that, unlike Bi-CG, avoids multiplication by the transpose of the matrix 
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A. The CGS method was proposed by Sonneveld [21]. Unlike GMRES, this method 
is not guaranteed to minimize the 2-norm of the residual, but the number of vectors 
required does not increase with each iteration so CGS does not need to be restarted. 
The convergence behavior of CGS can be very erratic. 

Bi-CGSTAB. This method was introduced by Van der Vorst [20] and is transpose- 
free like CGS, but with a more regular convergence behavior. 

SOR. The Successive Over-Relaxation method [17] is a stationary iterative method 
with a relaxation parameter u>. If to = 1, then SOR reduces to the Gauss-Seidel 
method. 

PRECONDITIONERS 

The performance of iterative methods can often be enhanced with preconditioning 
by premultiplying the linear system Ax = b by an approximate inverse M -1 of A: 

M~ x Ax = M~ l b. (3) 

The multigrid V-cycle can be used as a preconditioner. In addition, even though 
our emphasis is on multigrid methods, MGLab allows the V-cycle preconditioner to 
be replaced by something else. The current preconditioners available in MGLab for 
the Krylov subspace methods are the V-cycle, point Jacobi, and point Gauss-Seidel 
methods. Other preconditioners, such as the block Jacobi, red-black Gauss-Seidel, 
ILU, and SSOR methods, could be added relatively easily. 

A standardized interface to the preconditioner is available that is independent of 
the iterative solver. The operation z <— M~ l r is performed by the following call: 

z = precondition^, r) . 

The function precondition accesses the parameters needed to apply the precondi- 
tioner M -1 , which is implicitly defined in terms of A. This enhances the extensibility 
of MGLab in the sense that adding Matlab implementations of other iterative meth- 
ods would be straightforward. 

VISUALIZATION OPTIONS 

MGLab exploits Matlab’s powerful graphics capabilities to plot the convergence 
history of the solution process and to visualize the computed solution. Currently 
we make use of the plot, surf, pcolor, and contour commands in Matlab. The 
visualization options are available through the graphical user interface. 
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function u_out = vCycle (level , b, u_in) 

% Use the zero vector for u_in as the default 

if nargin == 2, 

u_in = zeros (size(b) ) ; 

end 

if level == coarsest (level) 


u_out = 

coarse_grid_solve (level, b) ; 

else 


u 

= smooth (level, b, u_in, 'pre'); 

r 

= residual (level, b, u) ; 

b_c 

= restrict (level, r) ; 

u_c 

= vCycle(level+l, b_c) ; 

correct 

= interpolate (level , u_c) ; 

u 

= u + correct; 

u_out 

= smooth(level, b, u, 'post'); 


end 


Figure 1: V-cycle Function 

SOME COMMENTS ON THE INTERNAL STRUCTURE OF MGLab 

MGLab is written entirely in Matlab. One group of functions is devoted to the 
user interface; these make use of Matlab’s uimenu function. Other groups of functions 
implement the problem generation, algorithms, and visualization in MGLab. 

MGLab makes use of Matlab’s global mechanism. This approach leads to a 
considerable simplification of the programming in many situations but carries the 
software engineering risk of non-transparent code and the danger of subtle bugs. We 
have attempted to write the higher level functions such as sp_laplace, Vcycle, peg, 
and precondition in a way that does not require them to see the global variables. 
This results in very compact and readable code and reduces the chance that global 
variables will be accidentally damaged. The code for Vcycle is shown in Figure 1 to 
illustrate this approach. 

The “middle level” functions, such as smooth and restrict, access the global 
workspace in a disciplined manner. Some low level functions were created expressly 
for the purpose of accessing the globals and returning a single value so that the globals 
could be hidden from the higher level functions. 
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BUILT-IN DEMOS IN MGLab 


MGLab has a working and extensible framework for adding numerical experi- 
ments. Currently, the numerical experiments supplied with MGLab are the following: 

• A numerical study of the smoothing properties of the weighted Jacobi, Gauss- 
Seidel, and Red/Black Gauss-Seidel approaches, in physical and Fourier space 
[7]. The Fourier transforms are constructed out of Matlab’s fast Fourier trans- 
form (1ft). 

® A numerical Fourier analysis of the complementary roles of the coarse grid 
correction and the smoother for a model problem. 

• A comparison of the truncation error (pde error) and the discrete residual [7]. 
This demo highlights the ability of multigrid methods to achieve truncation 
error accuracy very rapidly. 

Figures 2 through 5 show the output of Demo 2. The intention of this demo 
is to give a visual sense for the different roles of the coarse grid correction and the 
(post-)smoothing. In this demo, we solve the Poisson problem on a 49 x 49 mesh by 
multigrid. The V-cycle parameters are as follows: 

• Two levels 

• Gauss-Seidel smoothing 

• (^ 1 ,^ 2 ) = (0>4), i.e., no pre-smoothing and 4 post-smoothing sweeps 

• Half-weighting restriction 

• Cubic interpolation 

• The coarse grid solver is Matlab’s built-in sparse Gaussian elimination 

The initial guess was chosen so that the initial error had a mix of low and high 
frequencies: 

4 

e^°\x,y) = Y, sin(10j7rx) sin(10j7r?/). 
j=i 

Figure 2 shows the initial error on the left and the absolute values of the (scaled 1 ) 
Fourier coefficients of the error on the right. Figure 3 shows the error in the first V- 
cycle, after the coarse grid correction (top) and after the post-smoothing (bottom). 
In each case, the error is shown in “physical” space (left) and in Fourier space (right). 

lr The 2D sine transform was applied to the error on the mesh. 
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Figures 4 and 5 are the same as Figures 3 except that in Figures 4 and 5 the errors 
are shown in the second and third iterations, respectively. 

These figures show how the coarse grid correction and the smoother complement 
each other by reducing the low frequency and high frequency error components, re- 
spectively. 

OBTAINING AND INSTALLING MGLab 

MGLab VI. 0 is currently available via anonymous ftp to casper.cs.yale.edu in 
the directory /mgnet/Codes/mglab. After the tar file is uncompressed, it should be 
untarred in a subdirectory such as "myrcame/matlab/MGLab. To run MGLab, simply 
change to your MGLab directory, start up Matlab and type MGLab. 

Comments and suggestions for improvements to the code are welcome; we plan to 
release future versions of MGLab that incorporate enhancements and bug fixes. 
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Initial error Absolute value of initial error in Fourier space 



0 0 0 0 


Figure 2: Output from Demo 2. Initial error for Poisson’s equation on a 49 x 49 grid. 
The two-grid algorithm was used, with Gauss-Seidel smoothing with (zq, v 2 ) = (0,4), 
half-weighting, and cubic interpolation. The error is shown on the left, and the 2D 
sine transform of the error on the right. 
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Error after coarse grid correction, iter = 1 


Absolute value of error in Fourier space 



0 0 0 0 

Error after post-smoothing, iter = 1 Absolute value of error in Fourier space 



0 0 0 0 


Figure 3: Output from Demo 2. Error in the first V-cycle, after the coarse grid 
correction (top) and after the post-smoothing (bottom). As in Figure 2, the error in 
physical space is shown on the left, and the error in Fourier space is shown on the 
right . 
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Error after coarse grid correction, iter = 3 Absolute value of error in Fourier space 

x 10" 3 



0 0 0 0 

Error after post-smoothing, iter = 3 Absolute value of error in Fourier space 

x 10" 4 



0 0 0 0 


Figure 5: Output from Demo 2 (same as Figure 3, for the third V-cycle). 
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A FULL MULTI-GRID METHOD FOR THE SOLUTION OF THE 
CELL VERTEX FINITE VOLUME CAUCHY-RIEMANN EQUATIONS* 


A. Borzi, K.W. Morton, E. Still, and M. Vanmaele 
Oxford University Computing Laboratory 
Numerical Analysis Group 
Wolfson Building, Parks Road 
Oxford, England 0X1 3QD 

SUMMARY 

The system of inhomogeneous Cauchy-Riemann equations defined on a square domain and 
subject to Dirichlet boundary conditions is considered. This problem is discretised by using the 
cell vertex finite volume method on quadrilateral meshes. The resulting algebraic problem is 
overdetermined and the solution is defined in a least squares sense. By this approach a consistent 
algebraic problem is obtained which differs from the original one by 0(h 2 ) perturbations of the 
right-hand side. 

A suitable cell-based convergent smoothing iteration is presented which is naturally linked to 
the least squares formulation. Hence, a standard multi-grid algorithm is reported which combines 
the given smoother and cell-based transfer operators. Some remarkable reduction properties of 
these operators are shown. 

A full multi-grid method is discussed which solves the discrete problem to the level of 
truncation error by employing one multi-grid cycle at each current level of discretisation. 

Experiments and applications of the full multi-grid scheme are presented. 

INTRODUCTION 

We discuss a full multi-grid algorithm for the numerical solution of the system of 
inhomogeneous Cauchy-Riemann equations. This algorithm has been formulated in [1]. The 
Cauchy-Riemann equations are discretised by using a cell vertex finite volume method. We 
consider the continuous problem defined on a square subject to Dirichlet boundary conditions. 
Square cells are used for the discretisation. 

The motivation for the study of the Cauchy-Riemann system is that it provides a suitable 
model problem to develop a general multi-grid method for the solution of elliptic flow equations 
when they are discretised by using a cell vertex finite volume scheme. In this respect, the 
Cauchy-Riemann equations are the first model in the hierarchy of these fluid flow problems. In 
particular, it has been clearly shown that the elliptic part of the inviscid incompressible Euler 
problem is given by the set of Cauchy-Riemann equations [2]. Thus, for example, the present 
algorithm combined with an appropriate hyperbolic solver would provide an efficient solution 
method for that inviscid flow. 

*This work was financed in part by HCM contract CHRX-CT93-0042 and in part by SERC. 
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In fact, this idea, has been pursued since the work of Brandt and Dinar [3] (see, for example, 

[4, 5, 2, 6]), where the Cauchy-Riemann equations are taken as a first example of an elliptic 
system. In [3], for this model problem, a full multi-grid method is developed. Then, the 
techniques developed for this case are extended to the steady-state Stokes equations and the 
incompressible Navier-Stokes equations. 

However, such methods are constructed to approximate elliptic equations discretised on 
staggered grids. On the other hand, we want efficient algorithms which solve fluid flow problems 
discretised by using cell vertex finite volume schemes. Hence the need to re-develop the multi-grid 
method for problems resulting from the cell vertex discretisation. This is not a mere adaptation of 
the known techniques, since the peculiarity of the cell vertex scheme renders the previous methods 
unsuitable for the present task. 

In fact, when a cell vertex finite volume discretisation is used, there is generally the problem of 
how to define a suitable iterative scheme. In a cell vertex approach the resulting equations are 
cell-based, while the unknowns are node-based. Therefore there is not a one-to-one 
correspondence between unknowns and equations which can be inverted to provide a node-based 
iterative scheme. To circumvent this problem the so-called Kaczmarz iterative scheme was 
proposed [7, 8], which was applied in [9]. but it proved inefficient as a. solver. The use of the 
Kaczmarz relaxation is natural in the context of cell vertex discretisation. In fact, this type of 
approximation, w T hen applied to a first order elliptic system, usually results in an overdetermined 
system for which a least squares approach is necessary. The Kaczmarz iteration is then equivalent 
to the Gauss-Seidel method applied to the normal equations. We know, in addition, that this 
relaxation method can be used as a smoother in a multi-grid algorithm (see, for example, [10, 11]). 

It is also interesting to compare the Kaczmarz relaxation with another, widely used, method to 
iteratively solve a cell vertex system of equations, that is, the generalised Lax-Wendroff scheme. 
See, e.g., [12]. This technique is based on time-stepping the (artificial) unsteady problem, derived 
from the original one by adding a partial derivative in time of the unknown variables. Then, on 
each node, a new value for the solution vector is obtained from the previous one by Taylor series 
expansion in time, up to second order terms. We claim that the second order term in this Taylor 
expansion is equivalent to the Kaczmarz relaxation. 

For these reasons we shall employ a Kaczmarz relaxation as a smoother and combine it with a 
cell-based transfer operator of the residuals and a node-based prolongation operator of the 
unknown variables to obtain a fast multi-grid solver. 

In the next section we define the differential problem to be solved, that is, the inhomogeneous 
Cauchy-Riemann equations on a square, subject to Dirichlet boundary conditions. We discretise 
this problem by using the cell vertex finite volume method on a square mesh. The resulting linear 
system is handled through a. least squares approach. This method is then used in the third section 
in order to develop an iterative scheme which is known as the Kaczmarz relaxation. In the fourth 
section, this iterative method is used in combination with a cell-based residual transfer operator, 
in a multi-grid (MG) cycle. As shown in the section of numerical experiments, the corresponding 
full multi-grid method solves the discrete problem to the level of the truncation error in just a few 
work units. Then by using a suitable modification of the Kaczmarz relaxation we shall give an 
application of the FMG method on non-uniform grids. 
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THE CAUCHY-RIEMANN EQUATIONS AND THEIR CELL VERTEX DISCRETISATION 


We consider the system of Cauchy-Riemann equations 

f fe + g = / (1) (*,!/). 

in a square domain fi, where u(x,y) and v(x,y) are the unknown functions, and /M £ L 2 (fl) and 
G L 2 (Cl) represent the source terms. The following Dirichlet boundary conditions are 
prescribed on the boundary dCl: 



(u(P),v(P))n = G(P), P = (x,y)edn , 


(2) 


where (u,v) n denotes the component of the vector (u,v) normal to the boundary in the outward 
direction. The equations (1) with (2) represent a regular elliptic system [13]. The well-posedness 
of the problem follows from the compatibility condition 

f f^dxdy = f Gds . (3) 

Jq Jm 


If (3) holds then the equations (1), with the boundary conditions (2), have a unique solution. 

In order to discretise the problem (1), (2), we assume that the domain Q is partitioned by a 
uniform mesh of quadrilateral cells, whose mesh size is h. Each vertex of this grid will be labelled 
by t) j, i, j = 1, . . . , N. We denote u(xi , yj) = Uij and v(xi, yj ) = Vij, where x, = (i — 1) * h and 
yj = (j — 1) * h. The cell vertex discretisation of the system (1) on these grids follows by 
integrating the Cauchy-Riemann equations over each cell fijf = [i, i + 1] x [j, j + 1] and by using 
Gauss’ theorem to convert the integrals into line integrals along the cell edges, which are then 
discretised using the trapezoidal rule. In this way, the following cell vertex Cauchy-Riemann 
equations are obtained: 


2 h (~ U U ~ U i,j+l + Ui+l,j + M«+l,i+l) + 

Jh (~ Vi d + V i,3 + 1 - u «+1j + v i+l,j+l) = fij 1 i 

( 4 ) 

( _ U h3 T w t,j+l — u i+l,3 T ^i+lj+l) + 

2 1 ( V i,3 + _ v i+l,j ~ u i+l,j+l) = fij* i 

( 5 ) 

and ff] = f a a fWdxdy, l 

h 

= 1 , 2 . 


are 



u i,j ~ G\ t j , u;v,j — Gnj , 

II 

’^5 

(6) 

Vi, 1 = Gi t i , Vi,N = Gi,N , 

* = !,•••, N . 

( 7 ) 


For the above cell vertex discretisation we have 2 x ( N x N) unknowns, 2 x (( N — 1) x (N — 1)) 
cell equations, 4 x N given boundary values. Therefore we have 2 more equations than unknowns. 

By using the compatibility condition it is possible to reduce the number of equations by one. 
But we still have an over determined system. 
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In the following we will discuss the least squares approach which allows us to define a. unique 
solution to the system of the cell vertex Cauchy-Riemann equations. For this purpose it is 
convenient to introduce a compact notation. By A we denote the (2 /V 2 — 4 N + 2) x (2 A/ 2 — 4 A') 
matrix of coefficients, which is derived from (4) and (5). In fact, the first (N — l) 2 rows relate to 
Ihe discrete divergence equation, and the remaining (N — ]) 2 relate to the curl equation. The 
boundary values are incorporated in the right-hand side of the system. Thus any element of the 
right-hand side is of the form = fjj T boundary values. The right-hand side itself will be 

denoted by / = (/IB ,r 2 ') r ; / is the column vector whose first (N — l) 2 elements are the values 
of /B) ordered lexicographically, and the last (N — l) 2 elements are the values of p 2 ) ordered in 
the same way. With this notation the compatibility condition (3) becomes 


N - 1 


E W = o ■ 


( 8 ) 


«J = 1 

which shows that the sum of the first (N — l) 2 rows is zero. A similar property is observed for the 
second set of rows: their chequerboa.rd combination is equal lo zero. 'Phis condition requires 


i:{-n +j fS=o. 

«,i=i 

(9) 

Finally we denote the solution vector by w = (u v) T . This is a c< 
2N 2 — 4A r whose first N 2 — 2N components represent the value of 
the mesh, and the remaining components represent the solution t>, 
Hence, the problem (4), (5), (6) and (7) can be restated as 

)lumn vector of length 
t he solution u on the vertices of 
both ordered lexicographically. 

s 

II 

(10) 

Since this algebraic problem is overdetermined a solution can only 
sense, that is, by solving the normal equations 

be defined in a least squares 

A t Aw = A T f . 

(H) 

For the uniqueness of the solution the columns of the matrix A have to be linearly independent. 

As an example, let us take N = 3, which is the coarsest grid to be used in a MG cycle 
(described later). Equations (4) and (5), together with the boundary conditions (6) and (7), 
provide 8 equations for 6 unknowns. In addition we have the compatibility conditions (8) and (9). 

By solving the resulting system one obtains the solution values ?/ and w, such that the residuals 
of all original equations are zero. However if the condition (9) is not satisfied, the least squares 
formulation still provides a unique solution to the problem 

II 

v -*v 

(12) 

where 

P A = A(A T A)~ l A T 

(13) 

is the projection matrix onto the column space of A. 

It is worth not ing the relationship between the two cases when < 
satisfied. In this case the least, squares formulation is equivalent to 
divergence equations (defined on the same macro-cell, which is a s< 

lie conditions (8) and (9) are 
the combination of the discrete 
ina.rc cell containing four cells) 
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by using the following pattern (where 4- or — means that the equation enters in the combination 
with a +1 or —1 multiplicative factor): 


+ - - - - + 

+ - + + + - • 

The discrete curl equations are combined as 

+ - - - + + 

+ - + + + + • 


(14) 


(15) 


The system of equations thus constructed coincides with A T Aw = A T f. Notice that the idea of 
reducing the test space (of piecewise constant functions), described, for example in [14, 13], 
defines the same patterns depicted in (14) and (15). 

In the remainder of this section we study Pa in detail. We find that, as we refine the mesh size 
h, Pa tends to the identity operator, as expected. This analysis is necessary to define the 
truncation error of the least squares equation (12). In fact the truncation error due to the cell 
vertex discretisation of the differential operators in (1) originates from the use of the trapezoidal 
rule 


l }{*)dx = !(/(«) + /(&))(& - o) ~ ~ -) 3 /"(*) , ( 16 ) 

where x G (a, b). Thus in a cell of size h this summation introduces an error of order 0(h 3 ) on 
each side of the cell, and the global truncation will be a combination of all these contributions. In 
order to represent this error on each cell as a constant multiplied by some power of h, we derive 
each contribution by a Taylor expansion with respect to the center of the cell. This gives for the 
cell vertex Cauchy-Riemann equations a truncation error of order 0(h 2 ). 

The analysis of Pa is necessary to define the approximation of the right-hand side of the 
discrete problem that we actually solve, with respect to that of the original problem (4) and (5). 
Fortunately it is possible to give Pa explicitly, in a compact way. It has the block structure 


Pa = 


P A ] 0 

o pf 


(17) 


where each Pj^ is idempotent and symmetric. Let us first consider P^. 
denote by q = Thus, by using (8), we have 


For simplicity of notation 


(d ,, / (1) )» = (/ <1) )» 


9 £ /S’ = (/ W )‘ i * = 1 (Af - l) 2 ; 

*'ij= 1 


(18) 


hence, because of the compatibility condition, P^ acts on as an identity map. 

It should be clear what to expect from P^: when satisfies (9), P^ acts as an identity 
operator on f ^ . We have 

(p( 2) f(% = (f(% - q( -l )(*+!-(*-!>’) £ (_1 )*i/g> , (19) 

i,j - 1 


k = (N — l) 2 + l, . . . , 2(N — l) 2 . 
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! localise, in principle, is not required to satisfy (9) we must, evaluate the perturbation 

- /( 2)). For this purpose, let us introduce the two-dimensional chequerboard function on 
fi/,. Denoting the characteristic function on Q'J by \ij , the chequerboard function is 

Xh{r,y) = • ( 20 ) 

i,j = 1 

One can prove that \h weakly converges to zero (as /i — > 0) in /, 2 (H) (see [13]). 

To simplify the discussion which follows, without affecting the general validity of the result, we 
take fl = (0, l) 2 and assume homogeneous boundary conditions. Let us denote by (•,•) the 
Euclidean inner product-, then we can rewrite (19) as follows: 

(CT/ 121 )* = (/< 2 >), - /. 2 (xj,/ (21 )(,vOh ,k = (N- l) 2 + l,...,2(/V- l) 2 . (21) 

We have the following theorem (for the proof see [1, 13]). 

Theorem 1 Suppose that 6 L 2 ( fl), and let N = 2 ( + 1, with f. as some positive integer. Then 

l(«,/< 2 >)|<5l|^||| Mn) . (22) 


r rhus the perturbation q Y.?j=\ (~ l)' +j fjf due to the least squares approach is of order 0(h 2 ). 
Hy this approach we obtain a consistent algebraic problem which differs from the original one by 
an ()(h 2 ) perturbation of the right-hand side. 

The stability and convergence analysis of the cell vertex approximation of the Cauchy - R iemann 
equations is presented in [13]. T here we show that the cell vertex approximation is stable and 
second-order convergent in an appropriate Z/ 1 - norm. In particular the model problem which we 
also consider here is studied there as well. This gives an overdetermined system, for which the 
idea of reducing the test space is adopted, and stability and convergence properties follow. Similar 
convergence results are presented in [15] by using a least squares approach. 


AN ITERATIVE SCHEME 


From the discrete Cauchy Riemann equations it is clear that there is not a one-to-one 
correspondence between nodal values and equations. This means that a possible pointwise 
iteration must be constructed based on the cells, and thus it involves more than one cell. This is 
the case, for example, with the Lax-Wendroff iteration. Here we present a pointwise iteration 
procedure. 

As in the previous section we start by analysing the simplest case (N = 3). This case actually 
appears in our computations, since it represents the coarsest problem in a multi-grid cycle. In the 
standard MG approach to solve simple model problems using, for example, a finite difference 
discretisation, the algebraic equations on the coarsest grid are solved exactly, and the ‘solver’ 
there coincides with one step of the iteration procedure (e.g., the pointwise Gauss Seidel scheme). 
We now try to reproduce these aspects of an iteration for the cell vertex finite volume 
Cauchy Riemann equations. 
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The variables which must be computed on the coarsest grid are represented as they appear on 
the grid: 


« 2,3 

^1,2 ^2, 2,^2, 2 ^3,2 (23) 

U2, 1 

The iteration on the grid N = 3 must be capable of solving for all these variables in one step. 
Therefore the iteration is explicitly given by solving (11). The solution is given by 


«2,1 

= K, 2 

+ u 3,2 + — t>3,l)/2 

(24) 


+(/&’ 

- K1 - /ft 1 - fflW* , 

^2,2 

= («1,1 

+ Ml, 3 + u 3 ,l + M3, 3 )/ 4 



+(/» 

. m m m 

~r / 1,2 / 2,1 J2,2 

(25) 


+/. (2 . S - 

- m + /g - ?®w 4 , 


^2,3 

= (^1,2 

+ M 3 , 2 — ^1,3 + y 3,3)/2 

(26) 


+(/.',2 ) 

~/g + /g + /g)V 2 , 

Vl ,2 

= («1,1 

~ Ml, 3 + ^2,1 + ^ 2 , 3)/2 

(27) 


+ (/l ( ,l - 

- fS + Hi + Hiw , 

t>2,2 

= («i,i 

+ ^1,3 + ^3,1 + M 3 , 3 )/ 4 



+(/{;.’ 

f(!) 1 f(!) 

— J 1.2 T J 2,1 — J 2.2 

(28) 


A 2 ) 

-J 1,1 _ 

- Hi + hi} + fiiw 4 , 

^3,2 

= (-U3 

:,1 + M3, 3 + ^2,1 + M2, 3)/ 2 

(29) 


+US2 - 

- m - hi - hsw2 . 


Hence by substituting the values of the variables as given above, the coarsest problem is solved 
(in the least squares sense). Now we suppose that the mesh is refined by halving h. First, we 
notice that (24), (26), (27) and (29) provide the relaxation scheme for the boundary values, in the 
appropriate part of the domain’s contour. The remaining two, (25) and (28), are suitable to relax 
the variables u and v in the interior of the domain. Note that they coincide with the pointwise 
Gaiiss-Seidel step for the Laplacian discretised with the usual skewed five point finite difference 
stencil. Actually, they reflect the fact that it is possible to combine the cell vertex 
Cauchy-Riemann equations (4) and (5) to obtain such a stencil. 

So we obtain the Gauss-Seidel (GS) iteration for A T Aw = A T f, also called Kaczmarz 
relaxation (see, e.g., [10]). Since A T A is a positive definite, symmetric matrix, the GS iteration 
converges to the solution, in the least squares sense, of the discrete Cauchy-Riemann equations. 
The analysis of the smoothing property of this iteration is carried out in [10, 11]. 


Remark 1 After any iteration sweep the algebraic sum of the residuals for the divergence 
equations (4) is zero. Hence the compatibility condition for the corresponding residual equation for 
the solution error is satisfied. In fact , because of the (Dirichlet) boundary conditions, this error is 
zero on the boundary. This is an important property in order to solve the cell vertex 
Cauchy-Riemann equations by a multi-grid scheme. 
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A FULL MULTI-GRID METHOD 


Since we have the iterative scheme at hand we now need to define the restriction and 
prolongation operators to construct a multi-grid algorithm. The features of such operators are 
dictated, in part, by the problem we wish to solve and by the choice of the relaxation procedure. 
Even though our approach is based on least squares, we want a multi-grid code which in principle 
can be defined by using the properties of the discrete Cauchy-Riemann equations without 
involving the computation of the normal equations. A pure least squares approach would mean 
the development of an algebraic multi-grid method which applies to A T Aw = A T f. We instead 
want an algorithm which works directly on Aw = f. The reason is that the resulting scheme can 
be easily adapted to more general problems. The first step was done in the previous section where 
we have defined a smoothing iteration. It has been obtained by solving the coarsest (least squares) 
problem. 

On finer grids the solution of the normal equations is considered only locally. Clearly we had 
implicitly assumed to follow a standard MG approach. The approximation of the differential 
operator on any level is obtained by discretising it on that level. Let us consider, for the moment, 
the existence of two grids only, the coarse grid approximation of the differential operator is given 
by (4) and (5), but h is replaced by H — 2 h. To distinguish the two problems defined on different 
grids we denote by A h and A H the cell vertex difference operators on f lh and 0#, respectively. In 
the same way, the vector functions w and / on a given grid f lh will be denoted by w h and f h . 

The definition of the transfer operator for the residuals is based on the way the finite volume 
method approximates the source terms. In the weak formulation provided by the cell vertex 
scheme, the source terms are discretised by integrating them on the given volume. Therefore the 
sum of the right hand sides of four discrete Cauchy-Riemann equations (of ‘div’ or ‘curl’ type) 
based on neighbouring cells with a common vertex provides the right hand side of a discrete 
Cauchy-Riemann equation based on the coarse cell flj/ , which contains the fine cells. For this 
reason the transfer operator of the residuals from fine cells to a coarser one, is defined by the 
algebraic sum of the fine grid residuals contained in the coarse cell. That is, we have 

( Ih rh )u = ( r ij + n+ij + r ij+ 1 + U+ij+i)/4 , (30) 

where and (/, J ) refer to the same space point, and r t j is the residual of the cell vertex 
Cauchy-Riemann equations (4) or (5) on the cell 

Also the definition of the prolongation operator follows in a natural way from the properties of 
the present discrete problem. Let us notice that, after a sufficient number of smoothing sweeps, 
||e||i = ( A T Ae , e) 0. Thus, from the algebraic multi-grid approach [11, 10], we can use the 
equation A T Ae = 0 and invert it to construct the prolongation operator. Therefore, by 
construction, the repeated application of the smoothing iteration produces an approximation 
whose error tends to lie in the range of the interpolation. By using A T Ae = 0 we obtain the 
interpolation formula for those fine points which are situated at the center of the coarse cell. The 
interpolated value of a variable on this grid point will be the mean value of those at the 
neighbouring vertices of the coarse cell containing the point. The remaining fine grid variables, on 
the edges of the coarse cells, are obtained by linear interpolation between the nearest two coarse 
variables. So we have I h H . the nine-point prolongation [16], symbolised by the stencil 
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' 1/4 1/2 1/4 ' 

1/2 1 1/2 . (31) 

_ 1/4 1/2 1/4 _ 

It is interesting to notice the remarkable reduction properties of the transfer operators just 
described. It can be shown that 

A" = I?A k % , (32) 

which means that the coarse grid matrix problem that has been defined using the standard 
approach is obtainable from the fine grid matrix of coefficients and the given transfer operators, 
following a Galerkin approach. This property is very important. Since a sufficient number of 
relaxation sweeps produces an approximate solution w h whose error e h lies in the range of the 
prolongation operator, by using (32) the fine problem is reduced to a one of smaller size. The 
(least squares) solution of the coarse grid equation 

A H e H = I%(f h - A h w h ) (33) 

provides a good approximation for the fine grid error and is used in the coarse level correction 
w h := w h + Ijj€. h ■ 

A two level cycle is defined as the application of V\ pre-smoothing sweeps on the fine level, 
followed by the coarse level correction and post-smoothing sweeps. If one uses the same method 
to determine e H in (33) and the process is repeated recursively until the coarsest level is reached, 
then a multi-grid method is obtained. 

At this stage there are some important points to be discussed. As one can notice, the least 
squares formulation is mainly used locally to develop a suitable smoothing iteration. The resulting 
relaxation scheme solves the normal equations. On the other hand, the remaining components of 
the multi-grid algorithm defined here are based on the original overdetermined Cauchy-Riemann 
equations. To make these points more clear we report in Figures 1 and 2 the values of the L 2 
norm of the residuals of equations (4) and (5), and (11) as a function of work units, (i.e., the 
computation work invested to produce these residuals). In Figure 1 the simple relaxation is 
applied to solve the discrete problem on a given grid (this example is stated in the section of 
numerical experiments). The dotted line represents the residual norm of the cell vertex 
Cauchy-Riemann equations. The continuous line represents the residual of the normal equations. 
The same quantities are pictured in Figure 2 which reports the convergence history relative to the 
cyclic application of the MG scheme. The multi-grid method accelerates greatly the convergence 
of the Kaczmarz relaxation to the solution of the least squares problem. But the residual norm 
relative to the original Cauchy-Riemann equations converges to a non-zero value, since a solution 
for them does not exist. Because the relaxation and the coarse level correction are based on 
different equations, they are to some extent conflicting schemes. This fact appears in Figure 2 
where for sufficiently small residuals which turn out to be of the order of the truncation error on 
the finest grid, the MG convergence slows down. 

In order to define a full multi-grid scheme (see, e.g., [16]) we have to introduce another 
interpolation operator. We use a standard cubic interpolation operator. It is used to interpolate 
the solution to the problem on the level £, after n multi-grid cycles, to the level l + 1, and so on 
recursively until the finest level M is reached and, finally, n multi-grid cycles are performed on 
level M. The resulting algorithm will be denoted by n-FMG. We shall test the n-FMG code 
described here in the section on numerical experiments. An equivalent scheme which solves the 
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Figure 1: Convergence history for the Z 2 residual norms relative to the cell vertex Cauchy-Riemann 
equations (dotted line) and the normal equations (continuous line) when the Kaczmarz relaxation 
is applied (for the u component). 


cell vertex Cauchy-Riemann equations based on triangles has also been tested, giving similar 
results [1]. 


NUMERICAL EXPERIMENTS 

In this section we report the results of some numerical experiments. As we have previously 
seen, the multi-grid iteration has a convergence rate which slows down after a large number of 
iterations. However an optimal multi-grid method results when the MG cycle is used in 
combination with a nested iteration technique, thus resulting in a full multi-grid scheme. In fact 
we show that the full multi-grid scheme previously described is capable of solving the discrete 
Cauchy-Riemann equations to the level of the truncation error 0(h 2 ) employing only one MG 
cycle at each current level of discretisation. We consider the Cauchy-Riemann equations 
discretised on the square (0, 2) x (0, 2). Some numerical parameters are fixed, namely, the coarsest 
mesh size hi = 1; the number of intervals of the coarsest grid equals 2 in both directions. The 
initial starting approximation is always the zero function (except on the boundary). 

The first example has been previously considered to obtain Figures 1 and 2 (employing five 
levels); the source terms are (integrated over [x,x - h] x [y,y + h]) given by 

f (1) (x,y ) = -^r((a - b)(cos(by) - cos{b(h + y))) sm(ax)/(ab) 

+(a — b)(cos(by) — cos(b(h + y))) sin (a(h + x))/(ab)) , 

fW(x,y) = -p-((a + b) cos(ax)(sm(by) - sin (b(h + y)))/(ab) 

+(a + b) cos (a(h + x - ))(sin(6y) — sin(6(A + y)))/(ab)) . 

The exact solution and the boundary values are given by 

u(x,y) = sm(ax) sm(by) , 
v(x,y) = cos(ax) cos(by) , 
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(34) 

(35) 

(36) 

(37) 
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Figure 2: Convergence history for the L 2 residual norms relative to the cell vertex Cauchy-Riemann 
equations (dotted line) and the normal equations (continuous line) when the multi- grid scheme is 
used (for the u component). 

where a and 6 are given parameters. The behaviours reported in Figures 1 and 2 correspond to 
the case in which a = 1 and 6 = 2. Now let us consider a = 6 = l,.so that /W = 0. 

We have seen that the multi-grid step has satisfactory convergence properties in the first few 
iterations whenever the initial approximation produces residuals of the cell vertex 
Cauchy-Riemann equations which are larger than those corresponding to the exact solution of the 
least squares problem. Therefore it is convenient to use the MG code to work within this limit, 
which suffices in order to obtain an efficient FMG algorithm. 


Table 1: The Behaviour of the L 2 Norm of the Solution Error for Various re-FMG, for the u and v 
Components. 



u 

V 

M 

1-FMG 

2-FMG 

1-FMG 

2-FMG 

2 

0.19(-1) 

0.17(-1) 

0.12(-1) 

0.77(-2) 

3 

0.36(-2) 

0.32(-2) 

q.23(-2) 

0.20(-2) 

4 

0.81(-3) 

0.71(-3) 

0.57(-3) 

0.53(-3) 

5 

0.19(-3) 

0.17(-3) 

0.15(-3) 

0.14(-3) 

6 

0.46(-4) 

0.40(-4) 

0.37(-4) 

0.34(-4) 

WU 

3.5 

7.1 

3.5 

7.1 


Table clearly shows! that n-FMG with n = 1 is sufficient to solve the problem to the order of 
the truncation error. The work invested in the FMG process is measured in work units (WU), 
that is, the computational work of one relaxation sweep on the finest grid. 

tin this table the power 10 _<: is represented by (—6). We refer to the usual discrete L 2 norm of the error between 
the numerical and the analytical solution. 
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The same experiment is then repeated with different values of a and b. We always observe the 
behaviour described above and quadratic convergence of the numerical solution. 

So far we have considered structured uniform grids. We now comment on how to generalise the 
present algorithm to solve the Cauchy-Riemann equations on non-uniform quadrilateral grids. For 
this purpose we consider the generalised Lax-Wendroff (LW) scheme (see [17, 12]). This technique 
is based on time-stepping the (artificial) unsteady problem, derived from the original one by 
adding a partial derivative in time of the unknown variables. Then, on each node, a new value for 
the solution vector is obtained from the previous one by a Taylor series in time, up to second 
order terms. The first and the second order term are then discretised by applying cell vertex finite 
volume techniques based on the macro-cell [12]. 




Figure 3: Plots of contour lines of the functions v (top) and u (bottom). 

Now we can prove that, for uniform quadrilateral grids, the cell vertex approximation of the 
second order term in the Lax-Wendroff method is given (up to the multiplicative constant St 2 ) by 

- \a t (Aw - f) . (38) 

Hence, the application of a LW iteration, which consists of only the second order term (take 
St = h), is equivalent to the Kaczmarz relaxation. On the other hand we notice that once the grid 
is non-uniform the least squares approach is difficult to apply, while the corresponding 
Lax-Wendroff iteration is of immediate application. Therefore, on non-uniform grids we extend 
the full multi-grid algorithm presented above by using a second order LW iteration as a smoother. 
As a simple example of application we use this algorithm to solve the homogeneous 
Cauchy-Riemann equations on the geometry of the “bump” problem, subject to the condition 
(u,v) n = 0, except at the inflow and outflow boundaries where u = 1. In Figure 3 we plot the 
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contour line of the function u computed by a 3-FMG method ( M = 6). 
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DATA ASSIMILATION** 


Achi Brandt and Leonid Yu. Zaslavsky 
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Rehovot, 76100, Israel 


SUMMARY 


A multiscale algorithm for the problem of optimal statistical interpolation of ob- 
served data has been developed. This problem includes the calculation of the vector 
of the “analyzed” (best estimated) atmosphere flow field w a by the formula 

w a = w f + P S H T y , 

where the quantity y is defined by the equation 

(. HP f H T + R)y = w°- Hw f , 

using the given model forecast first guess w* and the vector of observations w°\ H is 
an interpolation operator from the regular grid to the observation network, P * is the 
forecast error covariance matrix, and R is the observation error covariance matrix. 

At this initial stage the case of univariate analysis of single level radiosonde height 
data is considered. The matrix R is assumed to be diagonal, and the matrix P* is 
assumed to be given by the formula P/j = a{ mj a j , where fi lJ is a smooth, decreasing 
function of the distance between the ith and the jth points. 

Two different multiscale constructions can be used to efficiently solve the problem 
of optimal statistical interpolation: a technique for fast evaluation of the discrete 
integral transform E; P/j v ji and a fast iterative process which effectively works with 
a sequence of spatial scales. In this paper we describe a multiscale iterative pro- 
cess based on a multiresolution, simultaneous displacement technique and a localized 
variational calculation of iteration parameters. 

*A preliminary version of the material presented here has been presented in [1], 
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INTRODUCTION 


The problem of optimal statistical interpolation of the observed data includes the 
calculation of the vector of the “analyzed” (best estimated) atmosphere flow field w a 
by the formula. 

w a = w-f + pf H T y, 

where the quantity y is determined from the equation 

(HP f H T + R)y = w° -Hw f , (1) 

using the given model forecast first guess w* and the vector of observations w° ([2]- 
[4]). Typically, w f is defined on a regular spherical grid, while the set of observations 
w° is defined on an irregular network of observation points; H is an interpolation 
operator from the regular grid to the observation network, P? is the forecast error 
covariance matrix, and R is the observation error covariance matrix. 

The observation error covariance matrix R is assumed to be diagonal with 

Ra = «) 2 - 

The forecast error covariance function Pf( aq,^) is defined for any pair of points aq 
and X 2 on the sphere by the formula 

P f (x,z) = a f (x)y(x,z)a f (z), 

where the forecast error correlation function y(x, z ) is described as a smooth, decreas- 
ing function of the distance between the points x and 2 [4]. The matrices P* and y 
are the restrictions of functions Pf(x,z) and y(x,z) on the regular latitude-longitude 
grid. 

The purpose of this paper is to conceptualize a fast multiscale iterative process for 
solving y from equation 1 when the observation network is strongly inhomogeneous in 
space. At this initial stage, we consider a univariate analysis of single level radiosonde 
height data- 

in this paper we consider only convergence properties of the iterative process. 
Accordingly, in the computer experiments all summations have been performed in a 
straightforward manner. An effective procedure for the fast evaluation of the integral 
transform on the sphere, based on the Brandt and Lubrecht approach [5], will be 
presented separately. 

Without loss of generality, equation (1) can be replaced by the system of equations 

2 Hvi + R aVi = w i - ( Hwf h (2) 

j 

where 

Pfj = VijZj 
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for the ?th and jth observation points and Xj, 

Pij ^/(Xj,Xj), 

and 

a{ = ( Ha f )i . 

(While the matrix P is defined for the points of the regular grid and interpolated to 
the observation network using the operator H , the matrix P is defined by the same 
formula directly on the observation network.) 

Indeed, the difference 

HPH T y - Py 

may be treated as an additional source on the right hand side. This small term is 
nonprincipa.1 at all scales and can easily be taken into account in iterations. 

Since we want to deal explicitly with the smoothness properties of the kernel p,ij, 
we replace (2) by 

H /W + izj? u i = Jj( w °i ~ (3) 

j °i 

where = a{ y{. The system of equations (3) can be written in matrix notation as 

Au = /, (4) 

where the matrix A is symmetric and presumably positive definite. 


GENERAL STRATEGY 


It is important to understand why many common iterative processes, such as 
Jacobi, Gauss-Seidel, or conjugate gradient, converge slowly when applied to equation 
(4). Let us consider, for example, the simplest iterative process 

u (»+i) =tl (») +wr (»), (5) 


where the residual 

r< n > = / - Au (n) , 

and parameter u « (p(A))" 1 , where p{A) is the spectral radius of the operator A. 

The process (5) reduces effectively the error components that correspond to the 
large eigenvalues A/, such that 

u;A; ~ 1, 

while the error components that correspond to the small eigenvalues X s , for which 


<^A S < 1, 


are reduced slowly [6]. Since the summation Pij u j i n (3) is made with a smooth ker- 
nel, eigenvectors of A that correspond to the large eigenvalues are (mostly) spatially 
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smooth, and eigenvectors of A that correspond to small eigenvalues are oscillatory 
in space. Therefore, one cannot define one particular value of u> that would give an 
essential reduction of all spectral error components. 

The effect described above is well-studied for the case when (4) is obtained as a 
grid approximation of the continuous integral equation. A few multiscale techniques 
based on multigrid ([5], [7]) and wavelet [S] approaches were developed in the 1990’s. 
Unfortunately, these techniques cannot be applied to the considered problem in a 
straightforward manner because of the strong inhomogeneity of the observation 

network. 

The central idea, of the approach developed below is to filter sequentially spectral 
components of ?•(") and to choose for each a value of the iteration parameter that 
gives an essential reduction of the corresponding error component. 

The major particular difficulty which has been overcome successfully in this work 
is how to define variable pass spatial filters Th-, depending on the scale parameter h, 
lor a field defined on a very inhomogeneous network. An appropriate filter will be 
described in §3. 

When some component J r h r ^ of the residual has been filtered, one should 
next calculate the correction vector. A simple way to do it is to use a scalar itera- 
tion parameter t c/, (i.e., to calculate the correction as cch^h 1 '^)- Then the modified 
iterative process (5) can be written as 

it (n+1) =u (n) +u;f ) ^r( n ), (6) 

where the iteration parameter depends on the scale h in some way. 

An intrinsic disadvantage of schemes like (6) is that one global iteration parameter 
J h n) is determined for the entire domain. The optimal correction at a spatial point 
Xj should, however, depend only on the residual values at points located at most a 
few h. from x Therefore, in §4, we construct a procedure for calculating an iter- 
ation parameter for each point Xj locally , using only the values of the residual 
components in some area around x t . This means that the iterative process which we 
construct can be written as 

«( n+1) = u {n) + n < C ) T h r {n) , (7) 

where 

fl h = diag(4” } ). 

We discuss the structure of the multilevel iterative cycle in §5. 

SPATIAL FILTER 

We now define a filter applicable to functions defined on a very irregular discrete 
network. Obviously, we want our filter to work like a usual spectral high pass filter 
in the data dense regions. 
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What do we want to get in regions of sparse data? Suppose we have an observation 
point s which is separated from other points by distance 

d s = mindist(s,p). 

We would like to take into account the residual component r s only on the scales h 
that are large enough: 

h ~ d s and h > d s ; 
we neglect r s on the small scales h: 


h < d s . 


We define the filter Th which satisfies these requirements by the formula 

(Fhr)i = Ti - 7; Y, r i ex P ’ 

where /?. is the current scale, and the parameter 7 ,• is defined by the formula 

1 dist 2 (i,j) 


(8) 


-1 


li 


I] exp 


K 2 


Note that the filter can be calculated efficiently using the fast summation proce- 
dure. 


CALCULATION OF n h 


The scalar iteration parameter in (6) can be determined from the variational 
condition of minimizing the Euclidean norm of the scale h component of the new 
residual r^ n+1 : 

4” 1 

where 1 1 • 1 1 is the Euclidean norm on the observation network 

l|u || 2 = («,«) 


and 


(u,u) = Y, 


U;Vi 


where the summation is made over the observation points, 
the formula 


, ,(») _ (m2 

k (9 ,?)’ 


This condition leads to 


( 9 ) 
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where 


p = F h rW, 

q = FhAp. 


As mentioned above, one disadvantage of this formula is its globality. In order to 
localize it, we use a family of weighted Euclidean norms. Let us introduce 


(“,«)/,< = 23 W ex P 


1 dist 2 (z, j) 

2 P 


where the summation is made over the observation points and 


IMI li = ( u,u)i,i . 

Now we can define the matrix flh in (7). We choose flh as follows: 

0[ n) = diag(u4" } ), 

where 

(n) _ (p {n) ,q (n) hh,i 

Uh,i {q {n \q {n) )zhS 

P (n) =^ r {n K 
q {n) = T h Ap {n) . 

(n\ 

Note again tha.t the fast summation procedure can be used to calculate t <4/ effi- 
ciently. 


STRUCTURE OF THE MULTISCALE ITERATIVE CYCLE 


In order to define the order of the multiresolution iterations, we have to prescribe 
some spatial scale to each iteration. The current spatial scale is determined by the 
formula 

h = H -2 x - level(n \ 

where H is the largest scale and level(n ) is the level prescribed for the nth iteration. 
If level(n) = 0, the filter is not used. The multiscale iterative algorithm can be 
written as follows: 

DO n = 1, NITER 

Residual calculation 

r (n) _ j- _ A u (n) 

IF level(n) > 0 THEN 

Definition of the current scale 
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I ^ —level(n) 

Filtering 

p(") = f hr (n) 

q( n ) = F h Ap ( n ) 

Calculation of the iteration parameters 

(n ) _ (P (n) .g (n) )3h,.' 

U h,i (,(»), g (")) 3h ,< 

0[ n) = diag(wi" ) ) 

Calculation of the new approximation 
to the solution 

«( B+1 ) = + fl[ n) p (n) 

ELSE 

p(") — r W 

g(") = Ap("> 


uf n+1) = u(") + w[ n) p (n) 
ENDIF 
ENDDO 


Filtering is not used 

Calculation of the iteration parameter 

Calculation of the new approximation 
to the solution 


We have used for our initial tests the standard V(2,2) multilevel cycle with 3 
iterations at the 0th level ([9], [10]). This means that the function level(n ) is periodic: 

level(LC + k) = level(k) for any k > 0, 


and 

' NLVL-k + 1, ifn = 2-fc-l + Z; k = 1, 2, . . . , NLVL; 1 = 0,1 

level(n) = < 0, if n = 2 • NLVL + /; /= 1,2,3 

„ k, if n = 2- NLVL + 2 + 2k + l- k = 1,2; . . . ,NLVL; 1 = 0,1 

where NLVL is the finest level number and LC = 4 • NLVL -j- 3. 
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NUMERICAL RESULTS 


At this initial stage of the work we made all the numerical tests with radiosonde 
height data only. The forecast correlation function is modeled by the formula 


fi{x u x 2 ) 


/ (dist(a: 1 , a’ 2 )) 2 

\ L 2 


- 1 . 20 s 


where dist(.'c 1 , x 2 ) is the three-dimensional distance between points x\ and x 2 and L 
is the correlation distance ( L = 951 km). 

The radiosonde station locations and values of a°, cr*, and w° — Hio * were obtained 
from the Data Assimilation Office of NASA/Goddard Space Flight Center. The data 
file contains model parameters and radiosonde height observations from 715 stations. 
Observation error variances a° were taken to be equal to 14.6 m for all radiosonde 
stations. Forecast error variances a{ vary from point to point and range from 18 m 
to 35 m. 

We made our experiments with NLVL = 5. The scales which were used are shown 
in Table 1. The results of our experiments are shown in Table 2. 


Table 1. Scale structure 


Level number 

1 

2 

3 

4 

5 

Scale /?., km 

10 000 

5 000 

2 500 

1 250 

625 


Table 2. Convergence of the iterative procedure 


Multiscale 

cycle 

L 2 norm of the 
residual 

Rate of decrease 
of the norm 

Initial 

2.5510+ 1 


1 

6.8610" 1 

0.027 

9 

3.7610“ 2 

0.054 

3 

2.0510- 3 

0.054 

4 

8.S710- 5 

0.043 


DISCUSSION: FURTHER IMPROVEMENTS 


The algorithm described above represents the first step toward development of a 
fast and efficient solver for the atmospheric data assimilation problem. It has been 
shown that the multiresolution algorithm can provide a fast solver. As long as the 
number of measurements is moderate (e.g., less than a few thousand), this algorithm 
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by itself is already effective enough. However, for larger sets of measurements, a major 
part of the work per cycle can be saved by a more advanced multiscale algorithm that 
features the following improvements. 

First, as already mentioned, a fast evaluation of the operator P* (i.e., of the 
multi-summation in Eq. (3)), can be based on the method of [5]. (See also [7]). 
Fast multi-summation can also be used for fast filtering. 

Secondly, at each scale h in any region where the number of of measurements per 
0(h x h ) cell is large, the multitude of residuals can be replaced by their proper 
local averages on a grid with mesh size 0(h). Similarly in such regions, the 
correction ul n+1 l — u( n ) will also be calculated on such a grid and will only later 
be interpolated to the measurement points. Actually, the residual averagings 
and the correction interpolation will not be done directly between the finest 
(measurements) level and each scale- /i level but will be transferred sequentially 
through all intermediate levels. 

Thirdly, the residual filtering can be replaced by distributive relaxation (as in 
[5]). The latter is simple in the grid regions mentioned above. In other regions, 
the filtering techniques may be easier to apply. 

These improvements will reduce the work of a cycle to a few operations per mea- 
surement, hopefully retaining the same convergence rates. 
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Effective Boundary Treatment for the Biharmonic Dirichlet Problem’ 


A. Brandt 
J. Dym* 

Department of Applied Mathematics and Computer Science 
The Weizmann Institute 
Rehovot, 76100, Israel 


SUMMARY 


The biharmonic equation can be rewritten as a system of two Poisson equations 
[6, 4]. Multigrid solution of this system is expected to converge with the same amount 
of work as solving two Poisson equations, requiring less than 70 floating point oper- 
ations (scalar multiply or addition) per fine grid point to reach a solution using an 
FMG algorithm. For periodic boundary conditions, this goal is attained by simple, 
straightforward application of multigrid. For Dirichlet boundary conditions, how- 
ever, convergence is impeded by poor interaction with the boundaries. Attempts to 
overcome the slowness without specifically addressing the boundaries have resulted 
in multigrid algorithms not attaining the Poisson convergence rate [3, 7]. 

We present three methods of boundary treatment with which full multigrid effi- 
ciency can be obtained. All implement an approach described by Brandt [1], concen- 
trating some additional effort near the boundary. The first approach [9, 5] simply 
adds a number of relaxation sweeps over points close to the boundary. The second [8] 
uses joint relaxation on near-boundary points. The third method [5] takes something 
from each of the first two methods, resulting in a solver more suitable for highly 
parallel applications. 


’Research Supported by Israel Ministry of Science Grant 4135-1-93, and by the C. F. Gauss 
Minerva Center for Scientific Computation at the Weizmann Institute of Science, Rehovot, Israel. 

^Present Address: Department of Mathematics DRB 155, USC, 1042 W. 36— PI, LA, Calif., 
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Introduction 


The biharmonic operator surfaces in a large number of applications. It is more effi- 
ciently solved as a system of two Poisson equations. The finite difference multigrid 
solver for this system is sensitive to the boundary conditions associated with the 
problem; for some, fast convergence is achieved with no special effort, while other 
boundary conditions require careful treatment of the gridpoints at and about the 
boundary to attain full multigrid efficiency. 

The Dirichlet boundary conditions are an example of the second type. Without 
special care, the multigrid boundary convergence rate dominates the process after 
a short while, slowing down the entire process.- Several methods for treating the 
boundary have been developed. Two axe presented here, along with a newly devised 
method, more suitable for use with parallel computation. 


The Biharmonic Dirichlet Problem 


The biharmonic equation is 


A 2 u = f 


( 1 ) 


within a given domain 0, along with two boundary conditions on dCl, where A 
represents the Laplacian operator. The Dirichlet boundary conditions are 


u = (j> 
du 

Tn = ^ 

on the boundary dfi,. In [4], Ciarlet and Raviart quote Glowinski [6], who suggested 
that the equation can be more efficiently solved as a system, 

A u — v = 0 

Av = /. (3) 

They prove that this system of equations can be solved with (single-level) efficiency 
equal to that of a Poisson equation solver 1 . The problem has also been extensively 
analyzed in multigrid literature [8, 9, 3, 7]. The latter two prove formally the con- 
vergence of straightforward multigrid solvers for (3), or slight modifications thereof 
(in [7]). The algorithms considered use a relaxation sweep over the entire domain 
as the smallest ‘unit of computation’. As a result, they are slow, requiring a large 
number of sweeps at each level to converge [3] or converging relatively slowly [7]. The 
algorithms described in the former two references implement (in different ways) an 
idea described by Brandt [1], concentrating on the area near the boundary, where 
slow convergence holds up the entire process. 

: Note that evaluating both equations of the system requires less work than evaluating the bihar- 
monic equation, although this, of course, is not the main computational benefit. 
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Basis For Comparison 


The desired cycling convergence rate for multigrid solvers of (3) is the interior smooth- 
ing rate of the Poisson equation (for whatever number of relaxations is performed on 
the finest grid per cycle). For lexicographic ordering, this amounts to a factor of 2 
per sweep, while for Red-Black ordering it is 4 for one sweep, 16 for two and 27 for 
three [1]. 

These rates, however, are achievable only for 2-level algorithms with ideal intergrid 
transfers. For practical multigrid applications, the convergence rate can be somewhat 
smaller, depending on the cycle parameters (including choice of intergrid transfers). 
Thus, the convergence of the Dirichlet boundary condition solver should be compared 
with that of a solver of (3) with periodic boundary conditions and otherwise identical 
parameters (there being no coupling of the equations near the boundary, this behaves 
exactly like two Poisson equations; note that an additional constraint must be sup- 
plied to each equation to make it nonsingular). Table 1 shows the convergence rates 
attained for various types of cycles, using full weighting and bi-linear interpolation for 
coarsening and prolongation respectively and Red-Black relaxation ordering. Using 
higher order interpolation would reduce the gap between the W (2, 1) smoothing rate 
and attained periodic convergence. 


Cycle 

W( 1,1) 

W( 2,1) 

V(l,l) 

V(2,l) 

Interior Smoothing Rate 

16 

27 

16 

27 

Periodic B.C. convergence 

15 

19 

8.5 

12.5 


Table 1: Comparison basis. Interior smoothing rates and attained multigrid 
cycling rates for periodic boundary conditions. 


Boundary Condition Discretization 

The Dirichlet boundary conditions (2) constrain the values of u on the boundary 
and of the derivative of u normal to the boundary to some given values. The nor- 
mal derivative is discretized by introducing a set of virtual u-points, parallel to the 
boundary and one mesh size from it (e.g., for the unit square, lines of points u(—h, y), 
tt(l + h,y), u(x,—h), and u(x, 1 + h)). Typically, the normal derivative is approx- 
imated using a central difference between the virtual points and the first interior 
u-points. 

The second equation in (3) holds in the interior, but not on the boundary or the 
virtual layer, v, however, is defined on the boundary as well as in the interior. When 
performing relaxation, the boundary conditions on u and on are used to set u 
values on the true and virtual boundary 2 , respectively, while the first equation in (3) 
on the boundary is used to determine v there. 

2 Actually, in all the algorithms implemented here, the virtual layer is implicit. The normal 
derivative boundary condition is used to substitute interior and boundary values of u whenever a 
virtual u is called for. 
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Boundary Slowdown 


Applying the multigrid algorithm to the Dirichlet problem results in convergence 
rates much slower than those obtained with periodic boundary conditions. Figure 1 
describes the problem better than a hundred words (but we’ll try anyway. . . ). Clearly, 
the effectiveness of the multigrid solver near the boundary is much less than for interior 
points. As a result, the asymptotic convergence rate of the entire solution grinds to 
a near halt. 
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Figure 1: Boundary slowdown. A residual map (for Av = f) after a num- 
ber of multigrid cycles without special boundary treatment. The 
boundary residuals dominate. 

A number of methods have been suggested to treat the slowness caused by the 
boundaries. Brandt in [1] has suggested adding extra relaxation sweeps at and around 
the boundaries, and has proved [2] that by doing so the efficiency dictated by the 
interior smoothing rates (as predicted by local mode analysis) can be attained. The 
additional work required is negligible relative to a full sweep on the entire domain. 
This idea has been partially implemented by Michel [9], who derived the adjusted 
residual transfers described above, and used them to measure convergence rates for 
lexicographic Gauss-Seidel relaxation. It is implemented here for Red-Black ordering. 
A different idea was suggested by Linden [8] and also by Papamanolis [10], who 
propose simultaneous relaxation of the boundary (v only, as u is given there) along 
with one neighboring interior point. Experimentally, this method has produced good 
results for grids up to 256 by 256. It only works, however, for the slower lexicographic 
relaxation ordering. After presenting these two methods, a new method fusing the 
two approaches will be presented, based on using simultaneous relaxation on a wider 
and deeper boundary strip. This method achieves the desired convergence rate for 
Red-Black ordering as well (but only for W cycles). Its main advantage (relative to 
the first method) lies in greater efficiency for massively parallel implementations. 
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Boundary Relaxations 


In this method, a small number of relaxation sweeps are performed over the boundary 
and a small layer of points adjacent to the boundary. The extra work per relaxation 
sweep is 0(\/N) ( N the total number of points), thus negligible relative to a full 
sweep ( 0(N )). The results to be presented were obtained using the following steps 
for a relaxation sweep: 

e Perform the following NREP times: 

1. Relax v on the boundary. 

2. Relax u and v on DEPTH interior layers, starting nearest to the boundary. 

« Using Red-Black ordering, relax the entire domain (including the boundary). 

In all experiments, the domain is the unit square. Simultaneous relaxation is per- 
formed at interior points, meaning that new values are computed for both v and u 
(satisfying both equations of (3) there) each time the point is relaxed. NREP and 
DEPTH are parameters of the solver. The boundary-layer relaxation uses sequential 
ordering, although Red-Black ordering would probably serve just as well. Sensitivity 
of the algorithm to the order of execution (boundary relaxations before body, bound- 
ary relaxations from the border inward, etc.) was not rigorously tested. However, a 
casual sampling indicates that using Red-Black ordering for the boundary relaxations 
doesn’t affect the results, while relaxing the boundary layer from the interiormost part 
to the boundary worsens performance somewhat. 


Finest Grid 

NREP 

DPTH 

V(l,l) 

V(2,l) 

W(2,l) 

W(l,l) 

64 

|HB| 


< 1.5 

< 2 

<4 

< 4 


1 

2 

3 

5 

10 

4 



1 

5 

9 

16 



2 

2 

5 

10 

16 


64 


3 

6 

10 

18 

10 



1 


3 

8 



3 

2 


13 

20 




3 

10 

13 

18 



2 

2 

5 

9 

16 

9 

128 

3 

2 

8 

12 

20 

9 


3 

3 

8 

13 

18 

13 


2 

2 

5 

8 

mm 

9 

256 

3 

2 

8 

12 

9 

9 


3 

3 

8 

11 

■s 

13 


Table 2: Cycling convergence rates. 


Most of the experimentation was performed on a 64 x 64 finest grid, with other 
grids being tested to examine the effect of the gridsize on the necessary boundary 
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treatment. In all computations the coarsest grid was 4x4, except for the 256 x 256 
finest grid, for which the coarsest grid was 8 x 8, as the software was designed for a 
maximum level depth of six. Table 2 sums it up. 

Let us now try to make some sense out of the jumble of numbers in Table 2. 
First, it is evident for all types of cycle that a boundary layer of depth one (with 
any amount of boundary passes), or a single boundary pass (with boundary layer 
up to three deep) will not do (some of these results are not displayed in the table). 
For (2, 1) cycles ( W and V), two passes with a width two layer bring convergence to 
about 80% of the periodic b.c. rate. Adding another pass improves results to 100%. 
For 1/(1, 1) cycles, nearly the same is true. 

The situation is a little different for W(l, 1) cycles. Here, the maximal convergence 
rate obtained is about 13, slightly lower than the periodic b.c. rate of 15. Three passes 
are necessary over a boundary layer of width three. 

For W cycles the results seem to be independent (or, at worst, imperceptibly 
dependent) on the gridsize. For V cycles, however, results deteriorate slowly with 
growing grids, requiring slowly increasing values for NREP and DEPTH (although 
the additional work remains negligible). 
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Figure 2: Treated boundary. A residual map (for An = /) after 30 multigrid 
cycles, adding boundary relaxations (three sweeps of a three-deep 
boundary layer) . The boundary residuals are of the same magni- 
tude as the interior residuals. 

The added relaxations on the boundary alter the residual map shown in Figure 1. 
Figure 2 shows the result — with three boundary sweeps over a boundary layer three 
units wide, after thirty W( 2, 1) cycles (long after convergence to machine accuracy is 
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Figure 3: Joint relaxation (Linden’s method). 

attained, if desired; for the purpose of measuring asymptotic convergence rates, the 
error is artificially magnified between cycles), the residuals on the boundary are still 
of the same order of magnitude as those in the interior. Thus, the goal of converging 
as fast as a solver for the Poisson equation has been accomplished for W{ 2, 1), V(2, 1), 
and V(l, 1) cycles (and nearly so for W(l, 1) cycles). 

It is interesting to note that the boundary treatment can be overdone (too much of 
a good thing. . . ). After a point, adding more sweeps for a given layer- width diminishes 
performance, probably due to the jump in residual magnitude at the interface between 
the boundary strip and the interior. 

This is the only method of the three presented here that reaches optimal perfor- 
mance for V as well as W cycles. The main drawback of this method is its unsuit- 
ability to massively parallel architectures, as body relaxation cannot proceed until 
the requisite number of boundary relaxations has been performed (rather than wait 
for two or three extra relaxations per sweep, it would be preferable to perform three 
complete cycles with no boundary treatment, which will, for W( 1,1) cycles, reduce 
the residuals by a factor of over thirty). 

Joint Relaxation 

An idea proposed independently by Linden [8] and Papamanolis [10] suggests relaxing 
each boundary v point together with its neighboring point (u and v), that is, solving a 
system for three simultaneous variables. At the corners, both near-corner v points are 
relaxed with the interior-corner point, a four-variable system. The variables joined in 
relaxation are shown in Figure 3. 

The method works well only for W cycles, and only when the boundary layer is 
relaxed in lexicographic ordering. Table 3 shows convergence rates for an implemen- 
tation of Linden’s version of the algorithm, using both lexicographic and Red-Black 
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ordering on the boundary and on the interior. Poisson-solver convergence rates are 
obtained for lexicographic ordering, but not for Red-Black. The same is true for other 
W cycles, though the results are not displayed. A curious feature of the solver (using 


Finest Grid 

Lex. 

RB 

32 

12 

12 

64 

8-10 

12 

128 

8-10 

12 

256 

8-10 

11 


Table 3: Cycling convergence rates, W{ 2,1) cycle, Linden’s method, lexi- 
cographic and Red-Black ordering. 

lexicographic ordering) is that it appears to have two stable rates of convergence, 
converging for a while at about 10 per cycle, then at about 8. Theoretically, the 
eight rate should dominate, as this is the smoothing rate for lexicographic ordering 
(in a (2, 1) cycle) predicted by local mode analysis. But this didn’t happen for more 
than 150 cycles using a 128 x 128 finest grid (on a 200 cycle trial, the first 25 cycles 
converged at a rate of about 10 per cycle, the next 60 at about 8, the next 60 at 
about 11, and from then on at about 8). 


Merging Methods 


Using a method that, in a sense, merges the two ideas above, a new form of bound- 
ary treatment is obtained, more suitable for parallel implementations. Relaxation is 
performed in a manner similar to Linden’s algorithm, although Red-Black ordering 
is used. A number of combinations were tested, one of which worked well. In the 
method which worked (Figure 4), three interior points were relaxed along with two 
boundary points (an eight variable system). At the corners the four cornermost inte- 
rior points and their four boundary point neighbors combine to form a twelve variable 
system. What happens, in effect, is that nearly every point on the boundary layer 
is relaxed twice (simultaneously with the interior), once with the red interior points, 
and again with the black ones. The exceptions are two points in each corner which 
don’t get a second relaxation. 

Results are summarized in Table 4. In order to simulate parallel implementation 
of the method, the boundary solver does not use new values of u and v computed in 
the present half-sweep. Rather, the boundary relaxation during the ‘red’ part of the 
sweep uses values computed prior to the sweep, and during the ‘black’ part — values 
computed by the ‘red’ half. 

Finally, it is worth noting that, with a bit of preprocessing, all the variables linked 
in joint relaxation can be relaxed in parallel, with the number of operations per vari- 
able proportional to the number of ‘neighbors’ of the system of equations. ‘Neighbors’ 
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Figure 4: Joint relaxation. A single (u) variable at each boundary point and 
two variables at each interior point. 


include the right hand sides of the equations 3 , and all u and v variables participating 
in the equations but not solved for. Thus, the boundary system requires eighteen op- 
erations per variable (each ‘operation’ consisting of a constant multiplication and an 
addition), and the corner system twenty. Super parallel implementations can perform 
these computations in logarithmic time, as they are totally independent of each other. 
For comparison, relaxation of an interior point (after some economization) requires 
nine or ten operations to update u (the new value of v is computed in the first four). 
Here, each operation consists of a constant multiplication or an addition. 


Finest Grid 

W( 1,1) 

W(2,l) 

V(M) 

V(2,l) 

32 

14 

20 

6 

11 

64 

14 

20 

5 

9 

128 

14 

20 

4 

7 

256 

14 

20 

3.5 

6 


Table 4: Cycling convergence rates, Red-Black ordering, boundary relaxed 
as in Figure 4. 


3 On the finest grid, where the right hand side doesn’t change from cycle to cycle, a bit of 
preprocessing and some memory can reduce all the right hand side neighbors to a single operation 
per variable. The first and second methods will then each require only eleven operations per variable, 
and the corner system a mere nine. 
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FMG Convergence 


Using FMG and the boundary treatments outlined above, the biharmonic equa- 
tion can be solved in a single cycle to a point where the dominant term in the error 
(for some norms) is due to the truncation error (the error due to the approxima- 
tion of the equation on a grid) rather than algebraic error (the error in the solution 
of the equation). Table 5 shows the error in the solution obtained after one, two, 
and three multigrid cycles using the FMG algorithm to provide the initial guess 
for the finest level. Cubic interpolation is used to transfer the coarse grid solu- 
tion (u and v) to the fine grid. The differential solution for the system tested was 


Grid 

Norm 

V(l, 1) Cycle 

One Cycle 

Two Cycles 

Three Cycles 

W{ 2, 1) Cycle 

64 

Loo 

3.6e-6 

5.6e-6 


7.3e-6 

6.2e-6 

Efl 

l.le-7 

1.9e-7 

maaam 

2.6e-7 

2.1e-7 

M3M 


4.8e-8 

3.1e-8 

2.9e-8 

3.5e-8 

128 

Loo 

4.0e-7 



1.8e-6 

1.6e-6 

mum 


2.4e-8 

3.3e-8 


2.7e-8 

h 2 

6.6e-9 

4.5e-9 

2.1e-9 

1.9e-9 

2.8e-9 

1 

Loo 

1.3e-7 

3.5e-7 


4.6e-7 

3.9e-7 

mim 

1.0e-9 

■Bl 

4.2e-9 

4.1e-9 

3.4e-9 

h 2 




1.8e-10 

2.7e-10 


Table 5: FMG. Error (relative to the differential solution) after one V (1, 1), 
one, two and three W(l, 1) cycles, and one W( 2, 1) cycle. 


x 2 (l — x) 2 y 2 (\—y) 2 , and the errors in the solutions on each grid axe measured relative 
to this function. Three error norms (or seminorms) axe shown: Loo, Hi (y/J u\ + uj), 

and H 2 (^$ul x + ul y + 2ul y ). 

Clearly, after even one V(l,l) cycle 4 , the error ( Loo and Hi) is primarily due 
to truncation - in fact, in this particular case the algebraic error happens to cancel 
part of the truncation error, as solving the system to a higher degree of accuracy 
increases the distance from the differential solution. This is not true for the H 2 error, 
which does indeed get significantly reduced by further cycles. Using a higher order of 
interpolation to transfer the initial guess to the fine grid should correct this. 

The number of operations (per fine grid point) required to solve the biharmonic 
equation will therefore equal the work necessary to perform a single V(l, 1) cycle on 
each grid (coarsest to finest). Assuming 9 operations (constant multiply or add) per 
point for each relaxation sweep (10 on coarser grids), about 16 for coarsening and 
interpolation combined (note that residuals need be transferred from Red relaxation 
points only), and neglecting the work added by the boundary treatment, this gives a 
total of less than 70 operations per point to solve. 

4 These results were obtained using joint boundary relaxation. Using extra boundary sweeps gives 
a similar but slightly better solution. However, for larger grids, it may be necessary to use boundary 
sweeps, use more points in joint relaxation, or use W(l, 1) cycles. 
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ABSTRACT 


An efficient scheme for the direct numerical simulation of 3D transitional and 
developed turbulent flow is presented. Explicit and implicit time integration schemes 
for the compressible Navier-Stokes equations are compared. The nonlinear system 
resulting from the implicit time discretization is solved with an iterative method and 
accelerated by the application of a multigrid technique. Since we use central spatial 
discretizations and no artificial dissipation is added to the equations, the smoothing 
method is less effective than in the more traditional use of multigrid in steady-state 
calculations. Therefore, a special prolongation method is needed in order to obtain 
an effective multigrid method. 

This simulation scheme was studied in detail for compressible flow over a flat plate. 
In the laminar regime and in the first stages of turbulent flow the implicit method 
provides a speed-up of a factor 2 relative to the explicit method on a relatively coarse 
grid. At increased resolution this speed-up is enhanced correspondingly. 

INTRODUCTION 


Multigrid methods have proven to be very successful when computing steady 
solutions to the Reynolds-averaged Navier-Stokes equations [6,12]. In these equations 
a turbulence model is introduced and an approximation for the mean turbulent flow 
field is obtained. Many turbulent flows are only statistically stationary, however, 
and the actual solution is strongly time dependent. The development of numerical 
simulation methods for the time accurate simulation of turbulent flow forms a subject 
of intensive research (see [3,4,9,10,13]). In particular the transition from laminar to 
turbulent flow and the early stages of fully developed turbulence in relatively simple 
geometries are presently accessible to time-accurate numerical simulation. 

*This work was supported by the J.M. Burgers Centre and by the Netherlands Organization for 
Scientific Research (NWO). 
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DNS forms a key tool for computing detailed and reliable results for turbulent flow 
in simple geometries, which can subsequently be used in the validation of numerical 
methods and sub-grid models for large eddy simulations. In this paper we focus on 
an efficient higher order accurate method for direct numerical simulation (DNS) of 
compressible flow. In this paper results will be illustrated for the compressible flow 
over a flat plate. 

Because of the large variety of length scales present in high-Reynolds turbulent 
flows, a large number of grid points is required. The grid should be chosen such that 
the relevant modes with smallest length scales can still be adequately represented, 
resulting in very fine meshes. The time step is limited by accuracy requirements 
and stability conditions unless an absolutely stable time integration method is ap- 
plied. In general, the stability conditions are far more restrictive than the accuracy 
requirements, especially in the laminar regime. Stability conditions for explicit time 
integration methods lead to a linear relation between the grid size and the time step if 
the convective terms in the equations are most restrictive. Thus, the required number 
of time steps is proportional to n (the number of grid points in each grid direction) . In 
principle, absolutely stable (thus implicit) time integration methods are more suitable 
for this type of problem, since no stability restrictions are imposed on the time step. 
However, implicit methods are more expensive per time step. Hence, effective tech- 
niques are required for a fast solution of the nonlinear system of equations resulting 
from the implicit time discretizations in order to render these methods useful. 

Summarizing, the following dilemma is observed. Application of explicit time 
integration methods leads to a large number of (relatively cheap) time steps, with a 
total number of operations a n 4 . With an implicit scheme a relatively small number 
of (expensive) implicit time steps is required, leading to b n 3 operations. However, in 
general b is considerably larger than a. 

The main purpose of our study is the development of efficient tools for solving 
the system of equations that arises from application of an implicit time integration 
scheme to the compressible Navier-Stokes equations. The method presented in this 
paper is based on the work by Jameson [6,7] and Melson et al. [12]. For Reynolds- 
averaged Navier-Stokes (RaNS) equations they have proposed an iterative-implicit 
method which is based on a multigrid technique, leading to a considerable speed-up 
in comparison with explicit methods. The RaNS equations contain a lot of dissipation, 
which leads to a fast convergence of the relaxation method. In our equations, however, 
the dissipation is very small. As a result, we obtain a smaller speed-up than Melson 
et al. 

This paper is set up as follows. In the following section, a general description of 
the equations is given. Numerical solution techniques for the problem (including a 
multigrid technique) are presented in section 3. In section 4 computational results 
will be presented. Finally, we will give some conclusions. 
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GOVERNING EQUATIONS 


The equations describing the flow are the well-known Navier-Stokes equations, 
which represent conservation of mass, momentum and energy. In terms of dimension- 
less variables (density p, velocity components Uj and energy density e) these equations 
have the form (the summation convention is used): 

d t p + djifnij) = 0 (1) 

dt(puk) + dj(pUkUj) + d k p - djcrkj =0 k = 1,2,3 (2) 

d t e + dj((e+p)uj) - djfajUi-qj) = 0 (3) 


Here d t and dj denote partial differentiation with respect to time and the coordinate 
Xj , respectively. The pressure p is given by 


P = (7-1 ){e--pUiUi) 


(4) 


in which 7 denotes the adiabatic gas constant, which is set to 7 = 1.4. The viscous 
stress tensor cq,- is a function of the velocity components uf 

an = — r-( d-jUi + diU* — - 
10 Re \ 3 % 3 3 

where Re is the Reynolds number (the fluid viscosity is taken constant) . Furthermore, 
qj represents the viscous heat flux, given by 


^ijdk'Uk) 


(5) 


(7 — 1 )RePrM/ 


( 6 ) 


where Pr is the Prandtl number, for which we use Pr = 0.72, and M r is the reference 
Mach number. The temperature T is given by the ideal gas law: 



(7) 


In the Navier-Stokes equations (1-3), two types of fluxes can be distinguished. 
The convective fluxes consist of the first order spatial derivatives in the Navier-Stokes 
equations. These are of hyperbolic type and in Von Neumann analysis of the linearized 
equations, they give rise to imaginary eigenvalues. The viscous fluxes are parabolic 
and add dissipation to the system. This dissipation is 0(1 /Re). 

The behavior of the solution of the Navier-Stokes equations roughly can be charac- 
terized as follows. The nonlinear terms in the convective fluxes provide a continuous 
generation of modes with a small length scale from the components with a larger 
length scale. On the other hand, the dissipative fluxes add a certain damping to the 
system. This damping is very small for the components with a large length-scale, 
but it is larger for the small-length-scale r components. In the transitional stage from 
laminar to turbulent flow, small disturbances in a laminar flow give rise to growth of 
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large-scale eddies (which correspond to the most unstable modes in linear stability 
theory). These eddies generate eddies with smaller length scales. This continuous 
flow of energy to the eddies with smaller length scales is truncated at the scale where 
the dissipation counterbalances the growth effects, so that a statistically station- 
ary turbulent flow is obtained. This “viscous length scale” strongly depends on the 
Reynolds number. In the turbulent regime a broad spectrum of different modes in 
the flow develops. 


NUMERICAL METHOD 


In this section the discretization of the spatial derivatives and the explicit and 
iterative-implicit time integration methods for our problem will be discussed. 


Spatial discretization 


For the spatial derivatives in the equations, fourth order accurate central five- 
point difference schemes are used. Since artificial dissipation may seriously influence 
the solution during the transition from laminar to turbulent flow, the schemes are 
devised in such a way that no artificial dissipation is required. Odd-even decoupling 
is prevented by using a filtering procedure that just eliminates the shortest modes, 
see e.g. [4]. 


Explicit time integration 


After discretizing the spatial derivatives in the governing equations, the equations 
take the following form (with discrete state vector U ): 

d t U + f(U) = 0 (8) 

In the numerical solution of this problem, we denote the numerical solution at time 
level t n by U n . 

We have implemented a second order compact-storage four-stage Runge-Kutta 
method. The method is suitable for our problem, since the stability region contains a 
considerable part of the imaginary axis (up to 2y/2 i). Thus, this method gives stable 
results if the size of the time step satisfies the CFL condition: 


At X m < Ccfl (9) 

with Ccfl ~ 2a/ 2- The largest eigenvalue X m of the discrete linearized convective 
flux is given by 


A 


m 


M + M + M+ Iie I 1 i 1 i M 

Axi Ax2 Ax 3 y p v Axi Ax\ Ax\ J 


( 10 ) 


112 



with A = | \J—6 + 4\/6 + \\] — 39 + 16\/6 ~ 1.37 if fourth order accurate central 
five-point finite difference approximations are used on an orthogonal equidistant grid. 
Thus, increasing the number of grid points leads to a proportional reduction of the 
time step. 


Iterative-implicit time integration methods 


In order to speed up the solution method, an implicit time integration scheme 
has been applied. A-stable methods (i.e. , those that have a region of stability which 
includes the whole of the left half-plane (also referred to as absolute stability)) are 
preferred, so that the time step is not restricted by stability conditions, but only by 
physics. An iterative procedure is applied for the solution of the system of equations 
resulting from the implicit scheme, thus the approach is called an iterative-implicit 
method. 

We will only consider implicit linear multistep schemes. Because of the com- 
plexity of the equations and the large number of points involved in our problems, 
more advanced schemes are not considered here. The order of A-stable linear multi- 
step methods cannot exceed two (see [2,11]). Suitable methods are Backward Euler, 
the Trapezoidal Rule, and the two step Backward Differentiation Formula, BDF(2). 
Since Backward Euler is only first order accurate and generates considerable numeri- 
cal damping, we have decided not to use it. The Trapezoidal Rule is the second-order 
A-stable linear multistep method with smallest error constant [2,11]. However, since 
periodic eigenfunctions are not damped, extra smoothing is required in many appli- 
cations. The BDF(2) method is preferred, because it is less sensitive and has a larger 
stability region. 

For eq.(8), BDF(2) with constant At is defined by 

3 U n+1 - 4 V n + U n ~ l = -2 f{U n+1 ) At (11) 

In order to solve eq.(ll), it is written as 

a 0 V + f(V) = g (12) 

where V stands for the unknown solution U n+1 , g = (4 U n — U n ~ 1 )/(2At) is a known 
forcing function and a 0 = 3/ (2 A t) for this time integration method. Other implicit 
time integration schemes can be cast in the same form (12) by adjusting the constants 
and forcing functions. 

An iterative method for solving eq.(12) consists of the following steps: 

1. computation of a starting solution V°, 

2. relaxation method for improving the solution, 

3. truncation of the relaxation method if the desired accuracy is achieved. 

Our approach for these steps will be presented below. 
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Starting solution 


It is clear from the above that the solutions at two previous time levels are re- 
quired to calculate the solution at a new time level. This is inherent to the second 
order accurate discretization of the time derivative in eq.(ll). The availability of 
solutions at previous time levels can also be exploited to obtain a good starting so- 
lution at the new time level. The better the starting solution corresponds with the 
solution to eq.(ll), the smaller the amount of work that is necessary to calculate the 
solution within the required accuracy. A suitable starting solution is obtained from 
extrapolation of the solution from previous time levels. For constant time steps At 
quadratic extrapolation yields 

V° = 3 U n - 3 U^ 1 + U n ~ 2 (13) 

Another second order extrapolation method uses the time derivative of U given by 
the function /: 

V° = U 11 - 1 - 2 f(U n ) At (14) 

A second order extrapolation method with a number of similar terms in the truncation 
error as in the truncation error of the BDF(2) formula is 

l/o = ^(4Un -U 11 - 1 -4f(U n )At + 2f(U n ~ 1 )At) (15) 

The truncation errors of eq.(14) and BDF(2) are very different. As a result, more 
relaxations are required if extrapolation (14) is used than if eq.(13) or eq.(15) is 
applied. The choice of either eq.(13) or eq.(15) does not have a large influence on the 
required number of relaxations. 


Iteration method; application of multigrid 


A standard method to solve equations of the form (12) is the Newton-Raphson 
iteration method. In this method, a linearization of the flux vector around the known 
state U n is used, see e.g. [13]. However, application of this method goes at the 
expense of a large matrix inversion. 

Multigrid methods [1 ,5] are often applied for the efficient computation of steady- 
state solutions to RaNS equations. Because of the large number of grid points and 
the large variety of typical length scales in the solution, application of these methods 
leads to significant accelerations compared 'to classical iteration methods if suitable 
smoothing methods are used. In fact, our problem (12) is of the same form; hence we 
can utilize the same technique each time step. This is described below. 
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Transfer operators 


The solution is restricted to coarser grids by injection, and the defect vector by 
full weighting. 

A special treatment of the prolongation is required. Basically, the correction 
is prolonged to the finer grid by means of trilinear interpolation. This prolongation 
operator works well for stationary flow simulations, since the fine grid operator indeed 
satisfies the requirement of damping the high frequency components in the error. In 
the present solver, however, the high frequency components which may be created 
by the prolongation process are very slowly damped since the discretization method 
does not contain artificial damping. Therefore, after every prolongation first the 
shortest modes are removed from the correction by applying a filtering operator to 
the corrections. This filter eliminates the shortest modes. 


Smoothing method 


The rapidly varying eigenfunctions (so-called rough eigenfunction, see [14]) cannot 
be represented well on coarse grids. Therefore, an effective smoothing method is 
required. 

A common technique for the computation of steady-state solutions to the Navier- 
Stokes equations is solving the time dependent equations with multistage methods 
(see e.g. [7]). We have chosen a similar approach for our problem (12). In order to 
solve (12), we find the steady state solution of the following pseudo time evolution 
equation: 

d T V + oqV + f(V) = g (16) 

The advantage of writing the problem in this form is that it has the good stability 
properties of the implicit time integration method, whereas the flexibility of explicit 
time integration schemes is maintained. Furthermore, convergence acceleration meth- 
ods can be applied in a manner similar to steady-state calculations. 

We have chosen the following second order accurate Runge-Kutta method: 


(1 + JaoAr)Vi 
(1 + JooAt)V2 
(1 + |a 0 Ar)V3 
(1 + |a 0 Ar)V4 

(1 + &oAr)V5 
ym+l 


= v m 

= V„ - \{f c (V 0 ) + f d (Vo) - 9 )At 
= V 0 ~ Uf c (V) + U(V„) - g)Ar 
= V 0 -l(f c (V 2 ) + f d (V 2 )-g)AT 
= V 0 -l(f e (V 3 )+f i (V 2 )- 9 )AT 
= v;, - <fjv 4 ) + /jfi-'ii - <j)At 
= Vi 


(17) 


in which f = f c + fd ■ The dissipative part (fd) of the flux vector is only evaluated 
at a few stages, as proposed by e.g. Jameson [7, 8]. This both saves calculation time 
and increases the stability. 
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Furthermore, in this time stepping scheme, the linear term a 0 V in eq.(16) is treated 
implicitly. This is easily possible, as the term is diagonal, and useful, since it improves 
the stability of the pseudo time stepping method: since a 0 > 0 the stability function 
is modified, which leads to a larger stability region and a considerable reduction of 
the amplification factor, see e.g. [12]. 

If ao = 0, the CFL number can be taken to be 4.0. From a linearization of the 
stability function around At A m = 4.0, the following conditions can be derived for 
small a 0 /X m > 0: 

At (X m — a 0 ) < 4.0 , . 

At (A m + j^a 0 ) <4.5 ^ ' 


Other convergence acceleration techniques 


The convergence is more accelerated by the application of local pseudo time step- 
ping. Since a steady state equation is solved, the pseudo time step At need not to be 
equal in each point. The maximum allowed At is chosen in each point from eq.(18). 

At this moment, we are testing a Newton-Raphson type approach. This approach 
is based on a linearization of the function / around the solution V m (see eq.(16)). 
Using upwind discretizations for modifications of the solution, then, the characteristic 
variables associated with the largest eigenvalue are solved implicitly. First test results 
indicate that a speed-up of 10 to 20% can be obtained. 


Truncation of the relaxation process 


For the time being we have chosen the following approach. The iterations are 
truncated if the residual is below a prescribed value. The maximum value is chosen so 
that the truncation error is smaller than the discretization error of the time integration 
method. 


APPLICATION TO A TRANSITIONAL FLOW 


In this section we present some results from application of the techniques described 
above to a transitional wall-bounded flow. 

The flow is computed in a rectangular domain, with no-slip isothermal wall condi- 
tions at the wall, symmetry conditions at the upper boundary, and periodicity in the 
horizontal directions. The initial solution consists of the similarity solution for a com- 
pressible boundary layer combined with a small-amplitude disturbance, consisting of 
a number of unstable modes which are obtained from linear stability theory. 

The computations presented here were done with the iterative-implicit time in- 
tegration method. In the multigrid process, V-cycles are used with 1 pre-relaxation 
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and 2 post-relaxations. The relaxation process is truncated if the residual has be- 
come less than a certain prescribed value. This value is chosen such that the resulting 
truncation error is smaller than the errors due to the discretizations. 

The equations are discretized on a domain with 64 x 64 x 64 grid points, which is 
adequate at the stage of transition and quite coarse in the turbulent regime. 


Discretization errors 


First we will show that in the laminar regime the small time steps required for 
stability of the explicit time integration method are not necessary for accuracy. 

In the laminar regime, only disturbances with relatively large length-scales are 
present. The following table shows the relative errors due to the spatial discretization 
for the growth rate of the most unstable mode for different grid densities. £\ and s 2 
are the relative errors of the growth rate at two different locations: in the boundary 
layer and further in the domain, respectively. 


n 

£i 

£2 

32 

0.05 

0.005 

64 

0.005 

0.0003 

128 

0.0003 

0.0000 


Table 1 

Relative errors in growth rate of most unstable mode 
due to the spatial discretization, at two locations in the flow 
(n stands for the number of grid points in each grid direction) 

The results in Table 1 show the 4th order accuracy of the spatial discretization 
method. 

Discretization errors due to the time integration are given in Table 2. 


At 

Sl 

£2 

0.05 

0.0000 

0.0000 

0.10 

0.0002 

0.0000 

0.25 

0.001 

0.0003 

0.50 

0.003 

0.001 

1.00 

0.01 

0.005 


Table 2 

Relative errors in growth rate of most unstable mode 
due to the time discretization, at two locations in the flow 


It is clear from these data that for a 64 x 64 x 64-grid, a time step A t = 0.50 may 
be chosen which leads to an error in the growth rate due to the time integration 
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comparable to the error arising from the spatial discretization. This value is in large 
contrast with the size of the time step limit for the explicit time integration method: 
for this grid At < 0.04 is required. The large discrepancy between these two values 
of the time step is the main motivation for this study. In a later stage, when modes 
with smaller length scales become more important, this discrepancy becomes smaller. 
This is sketched in Figure 1. 

Since the number of operations per pseudo time step is proportional to the num- 
ber of grid points, the amount of work done on the coarse grids can be neglected. 
Furthermore, the required amount of CPU time for one explicit time step is approx- 
imately equal to the time for one pseudo time step with eq.(17) on the finest level. 
Thus, the ratio of the CPU time per implicit and explicit time step is the measured 
number of pseudo time steps on the finest grid. Typical numbers will be given in the 
following subsection. 



Figure 1: Typical behavior of the time step limit based on accuracy and stability 
requirements in the transition from laminar to turbulent flow 

From a comparison of numerical results obtained with the explicit and the implicit 
scheme, we conclude that the differences are very small in the laminar regime. Figure 
2 illustrates that with fixed time step At = 0.5 the results are accurate up to t « 2250. 
The accuracy of the implicit time integration method is increased if smaller values 
are used for the implicit time step. Thus, choosing time steps that are larger than the 
value prescribed by the stability condition for the explicit time integration method is 
allowed in the laminar regime. 


Comparison of the efficiency of the explicit and implicit method 

Various criteria can be used for determining the size of the time step. The applica- 
tion of the iterative-implicit time integration method is illustrated with the following 
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time time 

Figure 2: (a) Comparison of the development of the maximum value of the second 
vorticity component with explicit time integration (solid line) and with the implicit 
method (At = 0.5). (b) Relative error due to the large time step 

choices: (a) choose a fixed time step (At = 0.5), or (b) vary the time step so that the 
system of equations can be solved in about 2 V-cycles. Figure 3 shows a comparison 
of the work involved with both choices and with the explicit time integration method. 

The differences between the amount of work with fixed and variable time steps 
(both with the iterative- implicit time integration method) can be explained as follows. 
The iterative-implicit time integration method outlined above is similar to a predictor- 
corrector method. Predictor-corrector methods have a bounded stability region. The 
stability can be increased by choosing smaller time steps or by increasing the number 
of corrections. For a large number of predictor-corrector methods, the length of the 
part of the imaginary axis in the stability region increases less than linearly with the 
number of corrections, see e.g. the stability regions for Adams-Bashforth methods 
in [11]. Therefore, we expect that reducing the time step size At is cheaper than 
increasing the number of cycles. 

The stability requirements for the explicit time integration method lead to a max- 
imum time step At 0.04, so that 250 explicit time steps are needed per 10 time 
units. Apparently, the implicit time integration scheme is cheaper than the explicit 
scheme if the implicit time step is sufficiently reduced. In our experiences for t > 2400 
the grid should be reduced for a sufficient representation of the shortest modes; again 
our implicit scheme is more efficient than the explicit scheme. 


119 






time 


Figure 3: Number of pseudo time steps per 10 time units on the finest grid with (a) 
a fixed time step and (b) a variable time step compared with (c) the number of time 
steps for the explicit time integration method. In this time interval, the transition 
from laminar to turbulent flow occurs. 

Finally, it is noted that if the multigrid method is not used, solving the system 
of equations takes about 5 times more CPU time. Thus, the use of this method is 
decisive for the success of the implicit scheme. 

CONCLUSIONS 


We have compared the application of an explicit and an iterative-implicit time 
integration scheme to time-accurate DNS of compressible turbulent flow. Convergence 
acceleration techniques such as multigrid are crucial for an effective iterative solution 
of the system of equations. For the application presented in this paper, the iterative- 
implicit method is faster than the explicit solver. However, due to the small amount 
of dissipation in the equations a smaller speed-up is obtained than in methods for the 
RaNS equations. 
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FIRST-ORDER SYSTEM LEAST SQUARES FOR 
VELOCITY- VORTICITY-PRESSURE FORM OF THE STOKES 
EQUATIONS, WITH APPLICATION TO LINEAR ELASTICITY 

ZHIQIANG CAP, THOMAS A. MANTEUFFEL*, AND STEPHEN F. McCORMICK* 

Abstract. In this paper, we study the least-squares method for the generalized Stokes equations (in- 
cluding linear elasticity) based on the velocity- vorticity-pressure formulation in d = 2 or 3 dimensions. The 
least-squares functional is defined in terms of the sum of the L 2 - and /7 -1 -norms of the residual equations, 
which is similar to that in [6], but weighted appropriately by the Reynolds number. Our approach for 
establishing ellipticity of the functional does not use ADN theory, but is founded more on basic principles. 
We also analyze the case where the ff -1 -norm in the functional is replaced by a discrete functional to make 
the computation feasible. We show that the resulting algebraic equations can be uniformly preconditioned 
by well-known techniques. 

Key words, least squares, Stokes 

AMS(MOS) subject classifications. 65F10, 65F30 

1. Introduction. Recently, there has been substantial interest in the use of least- 
squares principles for numerical approximation of the incompressible Stokes and Navier- 
Stokes equations, especially those based on vorticity (more precisely, velocity-vorticity- 
pressure); for example, see [5, 12, 13, 14, 19]. Its attractions include accurate approximation 
to meaningful physical quantities, formulation of a well-posed minimization principle, elim- 
ination of the need for artificial stablization techniques, and freedom in the choice of finite 
element spaces (which are not subject to the LBB condition). The computational results 
provided in these papers indicate that such methods have great promise. However, they do 
not yield optimally accurate approximations for the case of Dirichlet boundary conditions 
(see the analysis in [6]). In recent work by Bochev and Gunzburger [6], the ADN approach 
(see [2]) was extended to the vorticity formulation of the Stokes equations with rigorous error 
analysis. The least-squares functional is defined to be the sum of squares of the norms of the 
residual of each equation, where the norms are determined by the indices assigned to each 
equation by the ADN theory (see [1]). To be specific, consider the two-dimensional station- 
ary Stokes equations with Dirichlet boundary conditions. Then ADN theory was used in [6] 
to show that the least-squares functional ||f — (i/V 1 w+Vp)|| 2 -|-||Vx u— w|| 2 +1 + ||V- u|| 2 +1 
is equivalent to the sum of squared norms of each variable, ||u|| 2 +2 + ||w|| 2 +1 + ||p|j 2 +1 , for 
all q € R and f = 0. In particular, they consider the above functional with q = 0, then 
replace the H 1 -norms by mesh-dependent L 2 -norms, h~ 2 \\ ■ ||q (see also [2]). This mesh- 
dependent least-squares approach yields optimally accurate approximations for each variable 
with respect to approximation subspaces. However, it is not clear that an optimal solution 
algorithm for the resulting discrete equations can be developed at this stage of research, 
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albeit the matrix is symmetric and positive definite. 

In this paper, we consider a least-squares functional similar to that in [5] with q = — 1, 
but weighted appropriately by the Reynolds number. This is designed for the vorticity 
formulation of the pressure-perturbed variant of the generalized Stokes equation (which 
includes linear elasticity) with Dirichlet boundary conditions in two and three dimensions. 
Instead of applying ADN theory, we directly establish ellipticity and continuity of the 
functional in a product norm involving Re and the L 2 - and /f 1 -norms. The f/ _1 -norm 
in the functional is further replaced by the discrete ff -1 -norm to make the computation 
feasible, following the discrete H~ l least-squares approach proposed by Bramble, Lazarov, 
and Pasciak [3] for scalar second-order elliptic equations. Such discrete H -1 functionals 
are shown to be uniformly equivalent to the Sobolev norms weighted by the Reynolds 
number. This property enables us to show that standard finite element discretization error 
estimates are optimal with respect to the order of approximation as well as the required 
regularity of the solution, and that they are uniform in the Reynolds number. Moreover, 
the resulting discrete equations can be preconditioned by multigrid associated with velocity 
and by diagonal matrices associated with vorticity and pressure uniformly well with respect 
to the Reynolds number, the mesh size, and the number of levels. 

The paper is organized as follows. Section 2.1 introduces the (generalized) Stokes 
equations, the vorticity formulation, and some preliminary results. We introduce the least- 
squares functional weighted appropriately by v for the vorticity system, then establish its 
ellipticity and continuity in Section 2.2. Section 3 discusses the finite element approximation 
and Section 4 considers the discrete if -1 -norm least-squares functional and solution method 
for the resulting system of linear equations. 

2. Formulations of Least-Squares Functionals. In this section, we describe the 
weighted least-squares functional for the vorticity formulation and show its ellipticity and 
continuity in the appropriate Hilbert spaces. In Subsection 2.1, we start by defining the 
(generalized) Stokes equation and its vorticity formulation; we next give some notation for 
Sobolev spaces, the divergence and curl related Hilbert spaces, and their norms; we then 
include some preliminary results of functional analysis. In Subsection 2.3, we introduce 
a least-squares functional weighted appropriately by the Reynolds number, then directly 
show its ellipticity and continuity. 

2.1. The Stokes Equation and Its Vorticity Formulation. Let Q be a bounded 
open domain in (d = 2 or 3) with Lipschitz boundary dfl. The pressure-perturbed form 
of the generalized stationary Stokes equation in dimensionless variables may be written as 

. . f — I'Au + Vp = f, in fi, 

' ' ( V • u + 8p = 0, in SI, 

where the symbols A, V, and V- stand for the Laplacian, gradient, and divergence operators, 
respectively; f is a given vector function; v is reciprocal of the Reynolds number Re; f is 
a given vector function; and 6 is some nonnegative constant (6 = 0 for Stokes and 6 = 1 
for linear elasticity with u = where /i and A are the (positive) Lame constants). For 
more details on linear elasticity, see [6]. We consider the (generalized) Stokes equations 

(2.1) together with the Dirichlet velocity boundary condition 

(2.2) u = 0 on dCl 
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and the mean pressure condition 

(2.3) / pdx = 0. 

Jn 

Let curl = Vx denote the curl operator. (Here and henceforth, we use notation for 
the case d = 3 and consider the special case d = 2 in the natural way by identifying 3ft 2 
with the («i, a: 2 )-plane in 3ft 3 . Thus, if u is two dimensional, then the curl of u means the 
scalar function 

Vx u = diU 2 — C? 2 “l 

where u\ and are the components of u.) It can be easily checked that 

(2.4) Vx (Vx u) = -A u + V (V • u). 

(For d = 2, relation (2.4) is interpreted as 

V i (Vxu) = -Au+V(V- u), 
where V- 1 - is the formal adjoint of V x defined by 

M -*«)■> 

Introducing the vorticity variable 


« = Vxu, 


using the identity (2.4), and remembering the “continuity” condition V • u + S p = 0, then 
the generalized Stokes equation (2.1) may be rewritten in vorticity form as follows: 


(2.5) 


yVxw + (l + v8)V p — f, 

in ft, 

Vxu-w = 0, 

in ft, 

V • u + 6p = 0, 

in ft. 


Next, we establish notation. We use the standard notation and definition for the Sobolev 
space H 3 ( ft) rf for s > 0; the standard associated inner product and norm are denoted by 
(•? Os,n a nd || • || S) fi, respectively. (We supress the subscript d because dependence of the 
vector norms on dimension will be clear by context. We will omit the measure ft from the 
inner product and norm designation when there is no risk of confusion.) For s = 0, H s (£l) d 
coincides with L 2 (ft) d . In this case, the norm and inner product will be denoted by || • || 
and (•, •), respectively. As usual, Hq(Q.) will denote the closure of X>(ft) with respect to the 
norm || • || s and H~ S (Q) will denote its dual with norm defined by 


IMI-. 


sup -run — 

<¥0ei/ o s (fi) 11011* 


Define the product spaces Ho(ft) d = nf=i Hq(Q,) and H 1 (D) d = nf=i H 1 (fi) with stan- 
dard product norms. Let 


H(div; Jl) = {v€ L 2 (Q) d : V • v e L 2 (Q,)} 
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and 


tf(curl; ft) = {v € X 2 (ft) d : Vx v € L 2 (n) 2d ~ 3 }, 
which are Hilbert spaces under the respective norms 

||v||H(div,n) = (ll v l | 2 + II V • V H 2 ) 2 

and 

ll V lltf(CUrl;Q) = (IMI 2 + II Vx v|| 2 ) 2 . 


Define their subspaces 

H 0 (div; ft) = {v G H(div; ft) : v • n = 0 on 5ft} 


and 


i?o(curl; ft) = {v € H(curl; ft) : 7 T v = 0 on 5ft}, 

where -y T v = v • r for d — 2 and 7 T v = v x n for d = 3, and n and r denote the respective 
unit vectors normal and tangent to the boundary. Finally, define the subspace of 

Z 2 (ft) d by 


Lq(Q, ) d = {v € L 2 (£l) d : f V{dx — 0 for i = 1, ...,d}. 

Jn 

Here and henceforth, we will use C with or without subscripts to denote a generic 
positive constant, possibly different at different occurrences; this positive constant is inde- 
pendent of the Reynolds parameter v and other parameters introduced in this paper, but 
may depend on the domain ft. The next lemma is an immediate consequence of a general 
result of functional analysis due to Necas [12] (see also [8]). 

Lemma 2.1. For any p G Lo(ft), there exists a positive constant C such that 

(2.6) IHI < CIIVpH-L 


A result analogous to Green’s formula also follows: 

(2.7) (Vx z, <j>) - (z, Vx <f>) - f <£-(zxn )ds 

Jd 0 

for z G H(curl; ft) and <f> G H rl (ft) d . 

Finally, we will summarize results of Lemma 2.5 and Remark 2.7 in Chapter I of [8] 
that we will need in subsequent sections. 

Lemma 2.2. For any v G Ho(div; ft) fl Ho(curl; ft), there exists a positive constant C 
such that 

(2.8) || v||i < C (|| V • v|| + || V x v|j) . 
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2.2. Least-Squares Functional. Our least-squares functional is defined by the weighted 
sum of the L 2 - and # -1 -norms of the residual equations of system (2.5): 

(2.9) <j(u, u>, p; f) = ||f - (yVxw + (1 + ^)Vp)||l 1 + u 2 ||Vx u-a>|| 2 + i/ 2 ||V-u + 6p\\ 2 . 

(A similar functional without the weights of the Reynolds parameter v for the Stokes equa- 
tions was considered by Bochev and Gunzburger in [5].) The least-squares problem we con- 
sider is to minimize the above quadratic functional over V = Hq((IY x L 2 (Cl) 2d ~ 3 x Lq(CI) 

: find (u, w, p) € V such that 

(2.10) G(u,w,p;f) = inf G(v, or, 9 ; f). 

(v,<r,g)eV 

Next, we use an approach that departs from the established ADN theory (cf. [5]) to show 
ellipticity of the functional. 

Theorem 2.1. For any (u, u>, p) € V, positive constants C\ and C 2 exist independent 
of v such that 

(2.11) Ci (^ 2 ||u||J + * 2 H| 2 + (1 + i/<S) 2 ||p|| 2 ) < G(u, «, p; 0) 
and 

(2.12) <?(u, u>, p; 0) < C 2 (v 2 \\u\\ 2 + i/ 2 |M| 2 + (1 + ^) 2 ||p|| 2 ) . 


Proof. Upper bound (2.12) is straightforward from the triangle and Cauchy-Schwarz 
inequalities. We proceed to show the validity of (2.11) for (u, u>, p) € 17o(fi) <i xl7(curl; fl)x 
(Lq(£1) fl if 1 (fl)). It will then follow for (u, a >, p) € V by continuity. Now from (2.7) and 
the Cauchy-Schwarz inequality, for any <j> € Hq{Q,Y we have 

— (Vp, 4 >) = i(i/Vxw + (l + ^)Vp,0) + (Vxu-w, Vx0)-(Vxu, Vx^) 

< C (£||i/Vx « + (1 + ^)Vp||_! + ||Vx u - «|| + || Vx u||) |M|i, 


which, together with Lemma 2.1, implies that 
1 + pS I, ,, 1 + pS , 

— ^ — IIpII < c-^-wvpW-x 


(2.13) < C (i||j/Vxw + (l + ^)Vp||_ 1 -H|Vxu-a ? || + ||Vxu||). 

By (2.7), the Cauchy-Schwarz and triangle inequalities, Lemma 2.2, and (2.13), we have 
that 


1 


||Vxu|| 2 = (Vx u - w, Vx u) + — (i/Vx 03 + (1 + t/6)V p, u) 

1 + vS , _ c . 8(1 + i/8), 

+ — - — (p, V • u + 8p) - Hp, p) 

< || V x u - w|| || V x u|| + ^||i/Vxu> + (1 + v8)V p||_i ||u||i 

IIpII ||v-u + «p|| 
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< — ||i/V x w + (1 + ^)Vp||_! (|| V x u|| + ||V • u + Ml + %||) 

V 

+|| V x u - u>\\ || Vx u 1 1 + 1 * — ||pll ||v • U + Ml 

< || Vx u|| ^||Vx u - u>|| + ^||i/Vx oj + (1 + W>)V p||_i + || V • u + Ml) 

(j 

+-jG(u, «, p; 0) 
v l 

< i || Vx u|| 2 + ^<?(u, p; °)- 

Hence, 

(2.14) ||Vxu|| 2 < ^G(u, w,p; 0). 

But (2.14), (2.13), the bounds 

IMI < ||Vx u - w|| + ||Vx u|| and ||V • u|| < ||V • u + Ml + f> ||p||, 

and Lemma 2.2 imply (2.11). This completes the proof of the theorem. □ 

3. Finite Element Approximations. We approximate the minimum of G(u, fa>, p; f) 
in (2.10) using a Rayleigh-Ritz type finite element method. Assuming that the domain ft 
is a polyhedron, let T/, be a partition of the ft into finite elements, i.e., ft = U KeT h K 
with h = max{diam(A) : K e %}■ Assume that the triangulation % is quasi-uniform, 
i.e., it is regular and satisfies the inverse assumption (see [7]). Let V h = TJ h x W h x Ph 
be a finite- dimensional subspace of V with the following properties: for any (u, u>, p) G 


( J H' r+1 (ft) d 

x f r (ft) M - 2 )nv, 



(3.1) 

inf (||u — v|| + /z ||u — v|| a ) 
v€U h 

< 

Ch r+1 ||u|| r+1 

(3.2) 

inf (||w-a|| + /i||«-cr||i) 
<reW h 

< 

Cfc r ||fa>|| r , 

(3.3) 

inf (||l>-?ll + M|j>-?lli) 

< 

Ch r \\p\\r- 


where r > 1 is integer. It is well-known that (3.1)— (3.3) holds for typical finite element 
spaces consisting of piecewise polynomials with respect to quasi-uniform triangulations (cf. 

[ 7 ])- 

The finite element approximation to minimizing G(u, u>, p; f) in (2.10) on V becomes: 
find (ufc, <j>h, Ph) € Vh that satisfies 

(3.4) G(u fc , w fc , ph] f) = , inf G(v, a , q ; f). 

(v,cr, g )€V h 

Denote the norm induced by the functional according to 

|||(u, u>, p)| 1 1 = (^ 2 ||u||? + , 2 |M| 2 + (1 + ^) 2 ||p|| 2 ) 2 • 

Theorem 3.1. Assume that (u, w, p) € ^H' r+1 (ft) d x H r ($l) 2d ~ 2> j fl V is the solution 
of problem (2.10). Then 

(3.5) |||(u, oj, p) - (u fc , fa > h , P/i)|||v < C h r d r ( u, oj, p), 
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where C depends only on the domain ft and the ratio of the constants C 2 and C\ in Theo- 
rem 2.1 and where 


(3.6) 


dr( U, W, p ) = (»' 2 ||u||j! +1 + U 2 \\u\\ 2 r + (1 + uSfWpW 2 ) 


Proof. It is easy to see that the error (u — u^, — oJh, P~Ph ) is orthogonal to with 

respect to the inner product corresponding to the norm ||| • || | v- Bound (3.5) now follows 
from Theorem 2.1 and approximation properties (3.1)-(3.3). □ 

REMARK 3.1. The above result indicates that the finite element approximation is op- 
timal, both with respect to the order of approximation and the required regularity of the 
solution (see [3]). More specifically, bound (3.5) holds with 

dr( u, w, p) = (^ 2 ||u||^ +1 + ||p|| 2 ) 2 

since ui = Vxu and ||Vxu|| r < C||u|| r+ i. 

4. Solution Method and Discrete H~ 1 Functional. Theorem 3.1 indicates that 
the finite element approximation based on the functional G is also optimal with respect to 
the required regularity of the solution. Notice that the functional involves the H~ l norm, 
which in turn requires solution of a boundary value problem for its evaluation. There are 
two existing approaches to make the method computationally feasible: the mesh-dependent 
least-squares scheme proposed by Aziz, Kellogg, and Stephens [2] (see also [5]) and the 
discrete _ff -1 -norm scheme proposed by Bramble, Lazarov, and Pasciak [3]. As mentioned 
in the introduction, it is not clear that a fast solution algorithm for the resulting discrete 
equations from the mesh-dependent least-squares method can be developed at this stage of 
research. In this paper, we will therefore adopt the discrete # -1 -norm approach. Following 
[3], the ff _1 -norm in the functional is replaced by a discrete norm. This discrete if -1 
functional is computable and can be uniformly preconditioned by well-known techniques. 

To this end, let A : H~ 1 (Q,) d — > #(j(ft) rf denote the solution operator for the Poisson 
problem 

j -A<f> = v, in ft, 

' ’ ' ( d> = 0, on dSl, 

i.e., Av = tf> for a given v £ H~ 1 ( ft) d is the solution to (4.1). It is well-known that y/(A •, •) 
defines a norm that is equivalent to the # _1 (ft) d norm. Let Ah. : L 2 (Q.) d — > U/j be defined 
by Ah<p = <f>i where <f> is the unique solution in U/, satisfying 

f V<£ • V i/>dx — (<p, V»), V if € Ufc. 

J n 

Assume that there is a preconditioner Bh : L 2 (Sl) d — > that is symmetric with respect 

to the L 2 (Cl) inner product and spectrally equivalent to Ah, i.e., there are positive constants 
Ci and C 2 , not depending on h, that satisfy 

(4.2) C\ ( A h 4 > , <t>) < (B h <f>, 0) < C 2 (A h 4>, 4>), V € U fc . 

Following [3], define Ah = h 2 I + Bh where I denotes the identity operator on U^. In 
the remainder of this section, we analyze the least-squares approximation based on the 
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functional 


G h (n, u?, p; f) = ^(f - (i/Vx w + (1 + ^) V p)), f - (i/Vx « + (1 + p)j 

(4.3) + ||Vx u — u>|| 2 + i/ 2 ||V • u + 6 p|| 2 . 

Define the norm corresponding to the functional G h by 

lll(u, p)|||y h = yjG h { u, «, p; 0). 

Let Qh : L 2 (Q.) d — ► Ufe denote the L 2 (Cl) d orthogonal projection operator onto U^. We 
assume that Qh is bounded on i.e., 

(4.4) ||Qav||i<C'||v|| 1 , VveF 1 ^. 

Remark 4.1. The symmetry of Bh with respect to the inner product on L 2 (£l) d implies 
that Bh = BhQh- Similarly, Ah = AhQh ■ Thus, (4.2) holds for any v G L 2 (tt) d . 

It is easy to check that assumptions (3.1) and (4.4) imply that 

(4.5) ||(I - Qa)v||-i < Ch ||v||, V v G L 2 (n) d , 
and that (see [3]) 

(4.6) HQfcvHij < C ( A h v , v)<C ||v||i 1 , V v € L 2 (Sl) d . 

Lemma 4.1. For any (u, 05, p) G JETq ( 0) d x ff(curl; L!) x (Zo(ft) fl R 1 (fi)), positive 
constants C\ and C 2 exist, independent of h and u, such that 

c, (^nu|i;+^iia.|p + (i+^) 2 iM 2 ) < ni(u,^, P )nit li 

(4.7) < C -2 (x J ||u||; + p 2 /i 2 ||Vxw|| 2 + 1 / 2 M| 2 + ft 2 (l + x<) 2 ||V pj| 2 + (1 -t- ^) 2 l|pj| 2 ) - 


Proof. By Remark 4.1 and (4.6), we have that 

(A h </>, <i>) < c (, h 2 \\<i >\\ 2 + (Ah<f>, <f>)) < c ( h 2 ||<^|| 2 + ||<£|| 2 j ) , v e lW , 

which, together with the triangle inequality and Theorem 2.1, imply the upper bound in 
(4.7). To prove the first inequality in (4.7), by Theorem 2.1 it suffices to show that 

||i^Vx 03 + (1 + vS)V p|| 2 j < C (l/,(i/Vx a; + (1 + u&)V p), j/Vx 05 + (1 + i/6)V pj 

for any U5 G R(curl; fl) and any p G R 1 (D). From (4.5), (4.6), and Remark 4.1, for any 
<j> G L 2 (Q.) d we have 

IW 2 -, < 2 (11(7 - QM\-, + IIQ^IIi,) 

< c (a 2 imi 2 + t)) 

< c (Ah<t>, <j>). 

This completes the proof of the lemma. □ 
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Remark 4.2. If Wh C fT(curl; 12) and Ph C Xo(i2) n #*(12) satisfy an inverse 
inequality of the form 

|| V X£*?|| < C /T' 1 ||a>|| and ||Vp|| < C/i -1 ||p||, 

respectively, then the second inequality o/(4.7) can be replaced by i/ 2 ||u|| 2 + v 2 ||«|| 2 + (1 + 
i/rf) 2 ||p|| 2 for any u € anyu> G W h, and any p G Ph- It is well-known (cf. [7]) that 

the above inverse inequalities hold for typical finite element spaces consisting of piecewise 
polynomials on quasi-uniform triangulations. 

THEOREM 4.1. Let (u h, w/,, Ph) € Vh be the unique minimizer of G h { u, w, p; f) over 
Vh and let (u, uj, p) G (.ff r+1 (f2) d x H T ( 12) d x IF(12))nV be the solution of problem (2.10). 
Then 

(4.8)*/ ||u - UfcHi + */ ||w - « fc || + (1 + uS)\\p - Ph \\ <Ch T (*/ 2 ||u|| 2 +1 + (1 + v6) 2 \\p\\rf , 

where C is independent of the mesh size h and the Reynolds parameter v. 

Proof. It is easy to see that the error (u — u^, u> — u>h, p — Ph) is orthogonal to 'Vh with 
respect to the inner product corresponding to the norm ||| • |||v h - Bound (4.8) now follows 
from Lemma 4.1 and approximation properties (3.1)-(3.2). 0 

For the finite element spaces W h and Ph satisfying the inverse inequalities in Re- 
mark 4.2, the discrete H~ l functional G h { u, w, p; 0) can be preconditioned by the func- 
tional i/ 2 ||u|| 2 + v 2 |H| 2 + (1 + i/^) 2 ||p|| 2 that decouples velocity, vorticity, and pressure un- 
knowns, because they are uniformly spectral equivalent in the mesh size h and the Reynolds 
parameter v (see Lemma 4.1 and Remark 4.2). We can use any effective elliptic precondition- 
ers associated with velocity u, including those of multigrid type, and simple preconditioners 
associated with vorticity oj and pressure p, including those of diagonal matrix type. 

Acknowledgments. We thank Professors Pavel Bochev and Seymour Parter for help- 
ful discussions. 
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FIRST-ORDER SYSTEM LEAST SQUARES FOR THE STOKES 
EQUATIONS, WITH APPLICATION TO LINEAR ELASTICITY 

Z. CAI*, T. A. MANTEUFFEL* , AND S. F. MCCORMICK* 

Abstract. Following our earlier work on general second-order scalar equations, here we develop a least- 
squares functional for the two- and three-dimensioned Stokes equations, generalized slightly by allowing a 
pressure term in the continuity equation. By introducing a velocity flux variable and associated curl and 
trace equations, we are able to establish ellipticity in an H 1 product norm appropriately weighted by the 
Reynolds number. This immediately yields optimal discretization error estimates for finite element spaces 
in this norm and optimal algebraic convergence estimates for multiplicative and additive multigrid methods 
applied to the resulting discrete systems. Both estimates are uniform in the Reynolds number. Moreover, 
our pressure-perturbed form of the generalized Stokes equations allows us to develop an analogous result for 
the Dirichlet problem for linear elasticity with estimates that are uniform in the Lame constants. 

Key words, least squares, multigrid, Stokes equations 

AMS(MOS) subject classifications. 65F10, 65F30 

1. Introduction. In earlier work [9, 10], we developed least-squares functionals for 
a first-order system formulation of general second-order elliptic scalar partial differential 
equations. The functional developed in [10] was shown to be elliptic in the sense that its 
homogeneous form applied to the n + 1 variables ( pressure and velocities ) is equivalent to 
the (iif 1 ) n+1 norm. This means that the individual variables in the functional are essentially 
decoupled (more precisely, their interactions are essentially subdominant). This important 
property ensures that standard finite element methods are of LT ^optimal accuracy in each 
variable and that multiplicative and additive multigrid methods applied to the resulting 
discrete equations are optimally convergent. 

The purpose of this paper is to extend this methodology to the Stokes equations in two 
and three dimensions. To this end, we begin by reformulating the Stokes equations as a 
first-order system derived in terms of an additional vector variable, the velocity flux , defined 
as the vector of gradients of the Stokes velocities. We first apply a least-squares principle 
to this system using L 2 and H~ l norms weighted appropriately by the Reynolds number, 
Re. We then show that the resulting functional is elliptic in a product norm involving Re 
and the L 2 and H 1 norms. While of theoretical interest in its own right, we use this result 
here primarily as a vehicle for establishing that a modified form of this functional is fully 
elliptic in an H 1 product norm scaled by Re. 

This appears to be the first general theory of this kind for the Stokes equations in general 
dimensions with velocity boundary conditions. Bochev and Gunzburger [6] developed least- 
squares functionals for Stokes equations in norms that include stronger Sobolev terms and 
mesh weighting, but none are product H 1 elliptic. Chang [11] also used velocity derivative 
variables to derive a product H 1 elliptic functional for Stokes equations, but it is inherently 
limited to two dimensions. For general dimensions, a vorticity-velocity-pressure form (cf. 
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Angeles, CA 90089-1113. email : zceii@math.usc.edu. 
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sponsored by the Air Force Office of Scientific Reseeirch under greint number AFOSR-91-0156, the Nationeil 
Science Foundation under greint number DMS-8704169, and the Department of Energy under grant number 
DE-FG03-93ER25 165. 
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[4, 15]) proved to be product H 1 elliptic, but only for certain nonstandard boundary con- 
ditions. For the more practical (cf. [14, 17, 19]) velocity boundary conditions treated here, 
the velocity-vorticity-pressure formulation examined by Chang [12] can be shown by coun- 
terexample [3] not to be equivalent to any H 1 product norm, even with the added boundary 
condition on the normal component of vorticity. Moreover, this formulation admits no ap- 
parent additional equation, such as the curl and trace constraints introduced below for our 
formulation, which would enable such an equivalence. The velocity- pressure-stress formu- 
lation described in [7] has the same shortcomings. (If the vorticity and deformation stress 
variables are important, then they can be easily and accurately reconstructed from the 
velocity-flux variables introduced in our formulation.) 

While our least-squares form requires several new dependent variables, we believe that 
the added cost is more than offset by the strengthened accuracy of the discretization and 
the speed that the attendant multigrid solution process attains. Moreover, our modified 
functional requires strong regularity conditions; this requirement is to be expected for ob- 
taining full product H 1 ellipticity in all variables, including velocity fluxes. (We thus obtain 
optimal H 1 estimates for the derivatives of velocity.) In any case, strengthened regularity 
is not necessary for the first functional we introduce. 

Our modified Stokes functional is obtained essentially by augmenting the first-order 
system with a curl constraint and a scalar (trace) equation involving certain derivatives of 
the velocity flux variable and then appealing to a simple L 2 least-squares principle. As in 
[10] for the scalar case, the important H 1 ellipticity property that we establish guarantees 
optimal finite element accuracy and multigrid convergence rates applied to this Stokes least- 
squares functional that are uniform in Re. 

One of the more compelling benefits of least squares is the freedom to incorporate 
additional equations and impose additional boundary conditions as long as the system is 
consistent. In fact, many problems are perhaps best treated with overdetermined (but 
consistent) first-order systems, as we have here for the Stokes equations. We therefore 
abandon the so-called ADN theory (cf. [1, 2]), which is restricted to square systems, in 
favor of more direct tools of analysis. 

An important aspect of our general formulation is that it applies equally well to the 
Dirichlet problem for linear elasticity. This is done by posing the Stokes equations in a 
slightly generalized form that includes a pressure term in the continuity equation. Our 
development and results then automatically apply to linear elasticity. Most important, our 
optimal discretization and solver estimates are uniform in the Lame constants. 

We emphasize that the discretization and algebraic convergence properties for the gen- 
eralized Stokes equations are automatic consequences of the H 1 product norm ellipticity 
established here and the finite element and multigrid theories established in Sections 3-5 
of [10]. We are therefore content with an abbreviated paper that focuses on establishing 
ellipticity, which we do in Section 3. Section 2 introduces the generalized Stokes equations, 
the two relevant first-order systems and their functionals, and some preliminary theory. 
Concluding remarks are made in Section 4. 

2. The Stokes Problem, Its First-Order System Formulation, and Other 
Preliminaries. Let 12 be a bounded, open, connected domain in 9R n (n = 2 or 3) with 
Lipschitz boundary d£l. The pressure-perturbed form of the generalized stationary Stokes 


134 



equations in dimensionless variables may be written as 

f -t'A u + Vp = f, in Q, 

' ' ' ( V • u + 6 p = g, in fl, 

where the symbols A, V, and V- stand for the Laplacian, gradient, and divergence operators, 
respectively; v is the reciprocal of the Reynolds number Re; f is a given vector function; g 
is a given scalar function; and 8 is some nonnegative constant (8 = 0 for Stokes, <5=1 for 
linear elasticity). Without loss of generality, we may assume that 

(2.2) [ gdz— f pdz = 0. 

Jn Jn 

(For 5 = 0, equation (2.1) can have a solution only when g satisfies (2.2), and we are then 
free to ask that p satisfy (2.2). For 8 > 0, in general we have only that f a gdz = 8 pdz, 
but this can be reduced to (2.2) simply by replacing p by p — | and fir by 0 in (2.1).) 
We consider the (generalized) Stokes equations (2.1) together with the Dirichlet velocity 
boundary condition 

(2.3) u = 0 on dQ. 


The slightly generalized Stokes equations in (2.1) allow our results to apply to linear 
elasticity. In particular, consider the Dirichlet problem 


(2.4) 


f -pA u - (A + /z)VV • u = f, in ft, 

1 u = 0, on d£L, 


where u now represents displacements and p and A are the (positive) Lame constants. By 
A u here we mean the n - vector of components A «,•; that is, A applies to u componentwise. 
This is recast in form (2.1)-(2.2) by introducing the pressure variable 1 p = -V • u, by 
rescaling f, and by letting g = 0, 8 = 1, and v = (It is easy to see that this p 
must satisfy (2.2).) An important consequence of the results we develop below is that 
standard Rayleigh-Ritz discretization and multigrid solution methods can be applied with 
optimal estimates that are uniform in h, A, and p. For example, we obtain optimal uniform 
approximation of the gradients of displacements in the H 1 product norm. This in turn 
implies analogous H 1 estimates for the stresses, which are easily obtained from the “velocity 
fluxes”. For related results with a different methodology and weaker norm estimates, see 

Let curl = Vx denote the curl operator. (Here and henceforth, we use notation for 
the case n = 3 and consider the special case n = 2 in the natural way by identifying 9 i 2 
with the (*i, X 2 ) plane in 5J 3 . Thus, if u is two dimensional, then the curl of u means the 
scalar function 


VX U = diU2 — $2^1 > 


1 Perhaps a more physical choice for this artificial pressure would have been p = — ^ V • u, since it then 
becomes the hydrostatic pressure in the incompressible limit. We chose our particular seeding because it 
most easily conforms to (2.1). In any case, our results apply to virtually any nonnegative scaling of p, with 
no effect on the equivedence consteuits (provided the norms are correspondingly seeded); see Theorems 3.1 
eind 3.2. 
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where u\ and u 2 are the components of u.) The following identity is immediate: 


(2.5) 


Vx (Vx u) = -A u + V (V • u). 


(For n = 2, (2.5) is interpreted as 

V x (Vx u) = -Au + V(V'u), 
where V x is the formal adjoint of V x defined by 




d 2 q 
-0i q 


•) 


We will be introducing a new independent variable defined as the n 2 -vector function 
of gradients of the n,-, i = 1, 2, ..., n. It will be convenient to view the original n-vector 
functions as column vectors and the new ?i 2 -vector functions as either block column vectors 
or matrices. Thus, given 


u 


f Ui \ 

u 2 

V u n ) 


and denoting u* = (u\, u 2 , ..., u n ), then an operator G defined on scalar functions (e.g., 
G = V) is extended to n-vectors componentwise: 


G\i l = (Gui, Gu 2 , ..., Gu n ) 


and 


( Gu x \ 



V Gu n ) 

If U,- = Guj is a n-vector function, then we write the matrix 


H = Gu 1 = (Uj, u 2 , ..., U n ) 

I Un U 12 ■■■ U\ n \ 

U 2 \ u 22 • • ■ u 2n 

\ U n i U n2 • • • U nn J 

We then define the trace operator tr according to 

trU = Uu. 

2—1 

If D is an operator on n-vector functions (e.g., D = Vx), then its extension to matrices is 
defined by 


DU = (DU 1} D U 2 , •••, DU n ). 
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When each DU; is a scalar function (e.g., D = V-), then we will want to view the extension 
as a mapping to column vectors, so we will use the convention 

\ 

We also extend the tangential operator nx componentwise: 


(DU)* = 


/ DVi 
DU 2 

\ DU„ 


n X U = (nxUi, nx U 2 , • • • , nx U n ). 

Finally, inner products and norms on the matrix functions are defined in the natural com- 
ponentwise way, e.g., 

IBIII 3 = Z^llUiil 3 = E IIM 2 - 

t'=l i,j—l 

If we introduce the velocity flux variable 

U = Vu* = (V m, Vtt 2 , •••, V u„), 


then the Stokes system (2.1) and (2.3) may be recast as the following equivalent first-order 
system: 


( 2 . 6 ) 


U-Vu* 
-v (V • U )* + Vp 
V • u + 8p 
u 


0, in ft, 

f, in ft, 

g, in ft, 

0, on 9ft. 


Note that the definition of U, the “continuity” condition V-u + 5p = jrinft, and the 
Dirichlet condition u = 0 on 9ft imply the respective properties 


(2.7) VxU = 0 in ft, trJJ_+Sp = g in ft, and n x U = 0 on 9ft. 


Then an equivalent extended system for (2.6) is 


( 2 . 8 ) 


U-Vu f 

= 

o, 

in 

ft, 

—v (V • U) 4 + V p 

= 

f, 

in 

ft, 

V • u + 8p 

= 

9 , 

in 

ft, 

V tr U + 8 V p 

= 

Vtf, 

in 

ft, 

~ VxU 

= 

o, 

in 

ft, 

u 

= 

0, 

on 

9ft, 

n x U 

= 

0, 

on 

9ft. 


Let X>(ft) be the linear space of infinitely differentiable functions with compact support 
on ft and let Z>'(ft) denote the dual space of X>(ft). The duality pairing between X>'(ft) and 
X>(ft) is denoted by < •, • >. We use the standard notation and definition for the Sobolev 
spaces H a (£l) n and fP(9ft) n for s > 0; the standard associated inner products are denoted 
by (•, -)s,n anc * (•> Os, an, and their respective norms by || • || S) « and || • || Si an. (We suppress 
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the superscript n because dependence of the vector norms on dimension will be clear by 
context. We also omit Q from the inner product and norm designation when there is no 
risk of confusion.) For s = 0, H s (Q) n coincides with L 2 ($l) n . In this case, the norm and 
inner product will be denoted by || • || and (•, •), respectively. As usual, Hq(Q) is the closure 
of V(Q) with respect to the norm || • || s , and H~ S (Q) is its dual with norm defined by 

sup < ^ > 

II^IU 


Define the product spaces Hgfil )* 1 — J"J" =1 Hq(Q) and H s (fi) n = n"=i H s (fi) with stan- 
dard product norms. Let 

F(div; Q) = {v € L 2 (Sl) n : V • v € L 2 (tt)} 


and 


H{ curl; Q) = {v e L 2 (Q) n : Vx v € L 2 (fi) 2n ~ 3 }, 
which are Hilbert spaces under the respective norms 

IW|H(di^) = (l|v|| ! +||V.v|| 2 )" 

and 

IMItf (curhO) = (ll v H 2 + II V x V H 2 ) 2 • 

Define their subspaces 

f/o(div; fi) = {v 6 f/(div; D) : n • v = 0 on dfi} 


and 


/fo(curl; fi) = {v 6 H (curl; fi) : j T \ = 0 on 50}, 

where y T v = r ■ v for n = 2 and 7 T v = n x v for n = 3; n and r denote the respective unit 
vectors normal and tangent to the boundary. Finally, define 

Z/o(0) n = {v€ Z 2 ( 0) n : f V{ dx = 0 for i = 1, ..., ra}. 

Jn 

It is well-known that the (weak form of the) boundary value problem (2.1)-(2.2) has a 
unique solution (u, p ) € Lfo(0) n x Lg(£l) for any f € # -1 (0) n and for g € H l ( O) (e.g., see 
[16, 17, 14]). Moreover, if the boundary of the domain 0 is C 1,1 or a convex polyhedron, 
then the following 77 2 -regularity result holds: 

(2-9) ||j/u|| 2 + ||p||i < C (||f|| 0 + \Wg\\i) ■ 

(We use C with or without subscripts in this paper to denote a generic positive constant, 
possibly different at different occurrences, which is independent of the Reynolds number 
and other parameters introduced in this paper but may depend on the domain Q or the 
constant <L) Bound (2.9) is established for the case v = 1 and S = 0 in [16, 17]; the case for 
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general v and 6 = 0 is then immediate. The case S > 0 follows from the well-known linear 
elasticity bound ||u|| 2 + ||<r||i < C||f|| 0 , where f is the (unsealed) source term in (2.4) and 
a is the stress tensor. We will need (2.9) to establish full H 1 product ellipticity of one of 
our reformulations of (2.1)-(2.2); see Theorem 3.2. 

The following lemma is an immediate consequence of a general functional analysis result 
due to Necas [18] (see also [14]). 

Lemma 2.1. For any p in To (ft), we have 

(2.10) INI < C'||Vp||_ a . 


Proof. See [18] for a general proof. 0 

A curl result analogous to Green’s theorem for divergence follows from [14] (Theorem 
2.11 in Chapter I): 

(2.11) (Vx z, </>) = (z, Vx <f>) - f <£-(nxz )ds 

Jan 

for z € H(curl; ft) and <f> € H 1 (Q) n . 

Finally, we summarize results from [14] that we will need for G 2 in the next section. The 
first inequality follows from Theorems 3.7-3.9 in [14], while the second inequality follows 
from Lemmas 3.4 and 3.6 in [14]. 

Theorem 2.1. Assume that the domain ft is a bounded convex polyhedron or has C 1,1 
boundary. Then for any vector function v in either Ho(div; ft) n AT (curl; ft) or H (div; ft) fl 
Ho (curl; ft), we have 

(2.12) ||v||? < C (||v|| 2 + || V • v|| 2 + || Vx v|| 2 ) . 

If in addition, the domain is simply connected, then 

(2.13) ||v|| 2 < C (||V • v|| 2 + ||Vx v|| 2 ) . 


3. First-Order System Least Squares. In this section, we consider least-squares 
functionals based on system (2.6) and its extension (2.8). Our primary objective here is to 
establish ellipticity of these least-squares functionals in the appropriate Sobolev spaces. 

Our first least-squares functional is defined in terms of appropriate weights and norms 
of the residuals for system (2.6): 

Gi(U, u,p;f, 5 ) = ||f+^(V.U) f -Vp|| 2 _ 1 + I / 2 ||U-Vu t || 2 

(3.1) +z/ 2 || V -vL + Sp-gW 2 . 

Note the use of the H~ l norm in the first term here. Our second functional is defined as a 
weighted sum of the L 2 norms of the residuals for system (2.8): 

G 2 (U, u, p; f, </) = ||f + i/(V-U) t -Vp|| 2 + I / 2 ||U-Vu t || 2 

(3.2) + r/ 2 || V • u + Sp - g\\ 2 + v 2 ||VxU|| 2 + t/ 2 ||Vfr U + SVp- V<jj| 2 . 

Let 

V x = L 2 (ft)" 2 x Ho(ft) n x Lg(ft) and V 2 = Yo x H 0 1 (ft) w x (H x (ft)/3J), 
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where 


Yq = {V G H l (fi)” 2 : n x V = 0 on dSl}. 


Note that V 2 C Vi. For i = 1 or 2, the first-order system least-squares variational problem 
for the Stokes equations is to minimize the quadratic functional G,(U, u, p; f, g ) over V,-: 
find (U, u, p) G V,- such that 


(3-3) 


Gi( U, u, p; f, g) 


... inf „ GiOL v, q ; f, 5 ). 

(Y,v,g)eVi 


Theorem 3.1. There exists a constant C independent ofv such that for any (U, u, p) G 
Vx we have 

(3.4) 1 + J' 2 ||u||? + Ml 2 ) < G,(U, u, p; 0, 0) 

and 

(3.5) Gi(U, u, p; 0,0 )<C (t/ 2 ||U|| 2 + i/ 2 ||u || 2 + ||p|| 2 ) . 


Proof. Upper bound (3.5) is straightforward from the triangle and Cauchy-Schwarz 
inequalities. We proceed to show the validity of (3.4) for (U, u, p) G Wj = {H (div; Q) n x 
i?,J(fi) n X (fjp nff 1 ^))}. Then (3.4) would follow for (U, u, p) G Vi by continuity. For 
any (U, u, p) G Wi and <j> G /?o(fi) n , we have 

(V p, <j>) = (-i/(V-U)* + Vp,^)-i/(U, V</> 4 ) 

< II - * (V ■ U)* + Vpiuiwix + * linn ||V^||. 

Hence, by Lemma 2.1, we have 

(3.6) ini < c (n - * (v • u y + vp|i_! + * nun) . 

From (3.6) and the Poincare-Friedrichs inequality on u we have 
i/ 2 ||V u 4 || 2 

= v 2 (V u‘ - U, V u 4 ) + v ( -v (V • U)‘ + V p, u) + v (p, V • u -f 8p) - v8(p,p) 

< u 2 || V u 4 - U|| |j V U*|| + 1/ II - v (V • U)' + V p||-i||u||i + V INI II V • U + «p|| 

< (y || V U* - U|| + c II - 1 / (V • U)* + V p||_i) v \\ V u*|| 

-he'll - V (V • U)* + Vp||- 1 ^ ||V • U + £p|| + Cu 2 \\\J\\ ||V • u + $p||. 

Using the e-inequality, 2ab < ^a 2 + eb 2 , with e = 1 for the first two products yields 

(3.7) ^ 2 ||V u 4 || 2 < CG X ( n, u, p; 0, 0) + CV||n|| ||V • u + *p||. 

Again from (3.6) and the Poincare-Friedrichs inequality on u we have 

" 2 ||U|| 2 

= v 2 (n - v u 4 , n) + v (u, -v (v • n ) 4 + vp) + v (v • u + sp, p) - u8(p,p ) 

< ^ 2 nn - v u 4 n nnii + c «/ ir v u 4 n n - * (v • vy + v pn_i + * ini ii v • u -mpii 

< ^ 2 nn - v u 4 n nnn + c * n v u 4 n n - ^ (v • n)* + v pn_i 

+cn - 1/ (v -n)* + vp||_i*/ iiv • u + < mi + cv 2 nnn nv • u + mi- 
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Using the e-inequality on the first three products and (3.7), we then have 

I / 2 ||U|| 2 < CGi(SL u,p; 0,0) + Cv 2 ||Vu t || a + Ci/ 2 ||U|| ||V • u + <Jp|| 

< CGi(U, u, p; 0, 0) + Cv 2 ||U|| ||V • u + *p||. 

Again using the e-inequality, we find that 

(3.8) *' a ||H|| 2 <C'G 1 (SJ,u,p;0,0). 

Using (3.8) in (3.6) and (3.7), we now have that 

||p|| 2 < CGi(U, u, p; 0, 0) and z/ 2 ||V u^ 2 < CGi(U, u, p; 0, 0). 

The theorem now follows from these bounds, (3.8), and the Poincare-Friedrichs inequality 
on u. □ 

The next two lemmas will be useful in the proof of Theorem 3.2. 

Lemma 3.1. (Poincare-Friedrichs-type inequality.) Suppose that the assumptions of 
Theorem 2.1 hold. Let p G i7 1 (fi) satisfy J n pdz = 0; then 

(3.9) INI < C\p\ u 

where C depends only on fi. Further, let q G fl H 2 (Q)) n ; then 

(3.10) || V • q|| < C|V • q|i, 
where C depends only on 

Proof. Equation f^pdz = 0 implies p = 0 at some point in fi. The first result now 
follows from the standard Poincare-Friedrichs inequality. The second result follows from 
the fact that Jn V • q dz = 0. 0 

LEMMA 3.2. Under the assumptions of Theorem 2.1 with simply connected Q, for any 
p in If 1 (12) we have: 

(n = 2) let<f>= ( fa , fa Y and q = (qi, q 2 Y; if each q, G i7o(12)n£T 2 (12) and each fa G H l (Sl) 
is such that A fa G L 2 (Q) and n • V fa = 0 on d£l, then 

(3.11) I V • q + *p|? < C (| V • q + tr V V + Sp\ 2 + ||A</>|| 2 ) ; 

(n = 3j let $ = {fa, fa, <£3) and q = {qi, q 2 , qz) 1 ; if each qi € 17o(12) n H 2 (Q) and each 
fa G H 1 ^) 3 is divergence free with A </>,• G L 2 (Q,) n and n x (Vx^,) = 0 on dQ, 
then 

(3.12) |V-q + Sp|? <C (|V-q+frVx$ + 5p|f + ||A$|| 2 ) . 

Proof, (n = 2) The assumptions of Theorem 2.1 are sufficient to guarantee if 2 -regularity 
of the Laplace equation on 12; that is, the second inequality in the equation 

|Vx</>|i < C\fa 2 < C||A$||. 

Note that tr (V 1 fa, V x fa) = Vx 4>. Then, from the above and the triangle inequality, we 
have 

|V-q+Mi < 2 (| V • q + Vx </> + Sp\ 2 + | Vx<£| 2 ) < C (| V • q + tr V V + 8p\ 2 + || A0|| 2 ) , 
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which is (3.11). 

(n = 3) Bound (2.13) with v = Vx$ and identity (2.5) applied to each column of Vx$ 
imply that 

|trVx<6|J < 3|Vx$|f <C (||V-Vx$|| 2 + ||VxVx#|| 2 ) =C||A$|| 2 

since each <£ t - is divergence free. Eqn. (3.12) now follows from the triangle inequality as for 
the case n = 2. □ 

Theorem 3.2. Assume that the domain ft is a bounded convex polyhedron or has C 1,1 
boundary and that regularity bound (2.9) holds. Then, there exists a constant C independent 
of u such that for any (U, u, p) € V 2 , we have 

( 3 . 13 ) 1 (^ 2 nun 2 + i/ 2 nun 2 + hpii?) < g 2 (u, u , p; o, o) 

and 

(3.14) G 2 { U, u, p; 0, 0) < C (y 2 ||U||f + v 2 ||u||f + ||p||f) . 


Proof. Upper bound (3.14) is straightforward from the triangle and Cauchy-Schwarz 
inequalities. To prove (3.13), note that the H~ l norm of a function is always bounded by 
its L 2 norm. Since V 2 C Vi, then G\ < G 2 on V 2 . Hence, by Theorem 3.1, we have 

(3.15) */ 2 ||U|| 2 + ||u|| 2 + Ibll 2 < CG^V, u, p; 0, 0) < CG 2 ( U, u, p; 0, 0). 

From Theorem 2.1 and (3.9), we have 

(3.16) - 2 HUH? + HHli < C (- 2 ||U|| 2 + S ||(v ■ U)‘|| 2 + S ||VX U|| 2 + ||Vp|| 2 ) . 

It thus suffices to show that 

c (^IKv-u^ip + HVpii 2 ) 

(3.17) < || — ^(V • U)* + V p|| 2 + v 2 |£r U + 5p|f + v 2 ||Vx U|| 2 . 


We will prove (3.17) only for the case n = 3 because the proof for n = 2 is similar. First, we 
assume that the domain ft is simply connected with connected boundary. Since n X U = 0 
on 5ft, the following decomposition is admitted : 


(3.18) U^Vq' + Vx*!, 

where q G f/o(ft) n D H 2 (Q) n and $ is columnwise divergence free with n x (Vx $) = 0 on 
5ft. Here, we choose q to satisfy 


(3.19) 


f Aq = (V-U)‘, in ft, 

1 q = 0, on 5ft, 


Then, V = TJ - Vq* is divergence free and satisfies n x V = 0 1 . Since ft has connected 
boundary we know that f r n • V = 0*. Thus, Theorem 3.4 in [14] yields V = Vx$, where 
V • $ = 0*. 
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By taking the curl of both sides of this decomposition, it is easy to see that 

(3.20) ||A$|| = ||VxU|| < llHlli, 

so that || A $|| is bounded and Lemma 3.2 applies. Hence, 

||-j/(V-U)‘ + V p|| 2 + v 2 |fr U + <1 p| 2 + v 2 ||V x U|| 2 
(by equation (3.18)) 

= || — J/Aq+ Vp|| 2 + v 2 |V • q + tr Vx <| + <fp| 2 + u 1 ||A$|| 2 
(by Lemma 3.2) 

> || — i/A q + V p|| 2 + CV 2 | V • q + <Jp| 2 

(by Lemma 3.1) 

> || — vA q + V p|| 2 + CV 2 1| V • q + £ p||f 

(by regularity assumption (2.9) with u = q) 

> C (v 2 1| A q|| 2 + || Vp|| 2 ) 

(by equation (3.18)) 

= C7(^ 2 ||V XJ|| 2 + ||Vp|| 2 ) . 

This proves (3.17) and, hence, the theorem for simply connected ft. 

The proof for general ft (i.e., when we assume only that 5ft is C 1,1 ) now follows by an 
argument similar to the proof of Theorem 3.7 in [14]. □ 

We now show that the last two terms in the definition of G 2 are necessary for the 
bound (3.13) to hold, even with the extra boundary condition n x U = 0. We consider the 
Stokes equations, so that S = 0. Suppose first that we omit the term ||VxU|| 2 but include 
the term ||VfrU|| 2 . We offer a two-dimensional counterexample; a three-dimensional 
counterexample can be constructed in a similar manner. Let v — 1, u = 0, and p = 0. 
Choose any u 6 V(Q) such that AVw ^ 0 and define 

U = V x (Vw) f . 

Clearly, n x U = 0. It is easy to show that 


V ■ U = 0 and tr U = Vx(Vw) = 0. 

However, 

(VxU)* = AVw^O 

by construction. Thus, 

G 2 (U, u, p; 0) = ||U|| 2 , 

which cannot bound ||U|| 2 . That is, since u £ V(£l) is arbitrary, we may choose it so 
oscillatory that ||U||i/||U|| is as large as we like. This prevents the bound (3.13) from 
holding. 

Next suppose we include the ||VxU|| 2 term but omit the ||VfrU|| 2 term. Now set 
fi = (0, l) 2 , v — 1, u = 0, and p = cos(knxi) sin(7rx 2 ) and choose qi to satisfy 

f -A qi = - dip , 

1 m = 0, 
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in ft, 
on 5ft, 



for i = 1, 2. Then 


qi = 7(12 + 1) sin ( fc7ra; i) sinfoa^)- 


We also know that 


II V q2 1| < C \\d 2 pW = ||^cos(fc7ra;i) cos(7ra;2)|| < C, 
where C is independent of k. Now set 


U, = V* 


for i = 1, 2. Then n x U; = 0 and 

G 2 ( U, u, p; 0) = || A q - V p|| 2 + || V q|| 2 = || V q|| 2 < C, 
where C is independent of k. On the other hand, we have 

||p||i > C k, 

which again prevents the bound (3.13) from holding. 

4. Concluding Remarks. Full regularity assumption (2.9) is needed in Theorem 3.2 
only to obtain full H 1 product ellipticity of augmented functional G 2 in (3.2). This some- 
what restrictive assumption is not necessary for functional G 1 in (3.1), which supports 
an efficient practical algorithm (the H~ l norm in (3.1) can be replaced by a discrete in- 
verse norm or a simpler mesh weighted norm; see [5] and [8] for analogous inverse norm 
algorithms) and which has the weaker norm equivalence assured by Theorem 3.1. 

Nevertheless, the principal result of this paper is Theorem 3.2, which establishes full H 1 
product ellipticity of least-squares functional G 2 for the generalized Stokes system. Since we 
have assumed full ^-regularity of the original Stokes (linear elasticity) equations, we may 
then use this result to establish optimal finite element approximation estimates and optimal 
multiplicative and additive multigrid convergence rates. This can be done in precisely the 
same way that these results were established for general second-order elliptic equations (see 
[10], Sections 3-5). We therefore omit this development here. However, it is important to 
recognize that the ellipticity property is independent of the Reynolds parameter v (Lame 
constants p and A). This automatically implies that the optimal finite element discretization 
error estimates and multigrid convergence factor bounds are uniform in u (A and n). At 
first glance, it might' appear that the scaling of some of the H 1 product norm components 
might create a scale dependence of our discretization and algebraic convergence estimates. 
However, the results in [10] are based only on assumptions posed in an unsealed H 1 product 
norm, in which the individual variables are completely decoupled; and since the constant v 
appears only as a simple factor in individual terms of the scaled H 1 norm, these assumptions 
are equally valid in this case. On the other hand, for problems where the necessary H 1 
scaling is not (essentially) constant, extension of the theory of Sections 3-5 of [10] is not 
straightforward. Such is the case for convection-diflfusion equations, which will be treated 
in a forthcoming paper. 


144 



REFERENCES 


[1] S. Agmon, A. Doughs, and L. Nirenberg, Estimates near the boundary for solutions of elliptic 

partial differential equations satisfying general boundary conditions II, Com. on Pure Appl. Math., 
Vol. 17, (1964), pp. 35-92. 

[2] A. K. Aziz, R. B. Kellogg, and A. B. Stephens, Least-squares methods for elliptic systems, Math. 

Comp., 44 (1985), pp. 53-70. 

[3] P. B. BoCHEV, Personal communication, San Diego, July, 1994. 

[4] P. B. BoCHEV, Analysis of least- squares finite element methods for the Navier-Stokes equations, sub- 

mitted. 

[5] P. B. BOCHEV AND M. D. Gunzburger, Accuracy of least-squares methods for the Navier-Stokes 

equations, Comput. Fluids, 22 (1993), pp. 549-563. 

[6] P. B. Bochev and M. D. Gunzburger, Analysis of least-squares finite element methods for the 

Stokes equations, Math. Comp., to appear. 

[7] P. B. Bochev and M. D. Gunzburger, Least-squares methods for the velocity-pressure-stress for- 

mulation of the Stokes equations, Comput. Methods Appl. Mech. Engrg., to appear. 

[8] J. H. Bramble, R. D. Lazarov, and J. E. Pasciak, A least-squares approach based on a discrete 

minus one inner product for first order system, manuscript. 

[9] Z. Cai, R. D. Lazarov, T. Manteuffel, and S. McCormick, First-order system least squares for 

partial differential equations: Part I, SIAM J. Numer. Anal., 31 (1994). 

[10] Z. Cai, T. Manteuffel, and S. McCormick, First-order system least squares for partial differential 

equations: Part II, SIAM J. Numer. Anal, submitted. 

[11] C. L. CHANG , A mixed finite element method for the Stokes problem: an acceleration-pressure formu- 

lation Appl. Math. Comp., 36 (1990), pp. 135-146. 

[12] C. L. CHANG , An error estimate of the least squares finite element methods for the Stokes problem in 

three dimensions Math. Comp., 63 (1994), pp. 41-50. 

[13] L. P. FRANCA and R. Stenberg, Error analysis of some Galerkin least squares methods for the linear 

elasticity equations, SIAM J. Numer. Anal., 28 (1991), pp. 1680-1697. 

[14] V. GlRAULT and P. A. Raviart, Finite Element Methods for Navier-Stokes Equations: Theory and 

Algorithms, Springer- Verlag, New York, 1986. 

[15] B. Jiang, C. Loh, AND L. Povinelli, Theoretical study of the incompressible Navier-Stokes equations 

by the least-squares method, NASA Tech. Memo. 106535, ICOMP-94-04. 

[16] R. B. Kellogg AND J. E. Osborn, A regularity result for the Stokes problem in a convex polygon, J. 

Fhnct. Anal., 21 (1976), pp. 397-431. 

[17] O. A. Ladyzhenskaya, The Mathematical Theory of Viscous Incompressible Flow, Gordon and 

Breach, New York, 1963. 

[18] J. Necas, Equations aux Derivees Partielles, Presses de l’Universite de Montreal, 1965. 

[19] R. Temam, Navier-Stokes Equations, North-Holland, New York, 1977. 


145 



Page intentionally left blank 


TOWARDS AN FVE-FAC METHOD FOR DETERMINING 
THERMOCAPILLARY EFFECTS ON WELD POOL SHAPE 


David Canright and Van Emden Henson 
Mathematics Dept., Code MA 
Naval Postgraduate School 
Monterey, CA 93943 


SUMMARY 

Several practical materials processes, e.g., welding, float-zone purification, and 
Czochralski crystal growth, involve a pool of molten metal with a free surface, with 
strong temperature gradients along the surface. In some cases, the resulting ther- 
mocapillary flow is vigorous enough to convect heat toward the edges of the pool, 
increasing the driving force in a sort of positive feedback. In this work we examine 
this mechanism and its effect on the solid-liquid interface through a model problem: 
a half space of pure substance with concentrated axisymmetric surface heating, where 
surface tension is strong enough to keep the liquid free surface flat. The numerical 
method proposed for this problem utilizes a finite volume element (FVE) discretiza- 
tion in cylindrical coordinates. Because of the axisymmetric nature of the model 
problem, the control volumes used are torroidal prisms, formed by taking a polygonal 
cross-section in the (r, z ) plane and sweeping it completely around the 2 - axis. Con- 
servation of energy (in the solid), and conservation of energy, momentum, and mass 
(in the liquid) are enforced globally by integrating these quantities and enforcing con- 
servation over each control volume. Judicious application of the Divergence Theorem 
and Stokes’ Theorem, combined with a Crank-Nicolson time-stepping scheme leads 
to an implicit algebraic system to be solved at each time step. 

It is known that near the boundary of the pool, that is, near the solid-liquid 
interface, the full conduction-convection solution will require extremely fine length 
scales to resolve the physical behavior of the system. Furthermore, this boundary 
moves as a function of time. Accordingly, we develop the foundation of an adaptive 
refinement scheme based on the principles of Fast Adaptive Composite Grid methods 
(FAC). Implementation of the method and numerical results will appear in a later 
report. 


INTRODUCTION 

Several practical materials processes, e.g., welding, float-zone purification, and 
Czochralski crystal growth, involve a pool of molten metal with a free surface, with 
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strong temperature gradients along the surface. In many cases (e.g., laser welding) 
convection in the liquid metal is driven primarily by thermocapillary forces, and even 
in cases where other forces are stronger overall, thermocapillary forces may still be 
dominant near the edge of the pool [4]. Previous work [2] showed how vigorous ther- 
mocapillary convection can lead to localized intense heat transfer and high velocities 
in the “cold corner” region where the liquid free surface meets the solid. 

The present work examines how this localized heat transfer modifies the shape of 
the solid-liquid interface bounding the pool. When convection is vigorous, the high 
heat flux in the corner may melt away the solid near the surface, resulting in a sort 
of “lip” around the edge of the pool. This phenomenon is modeled computationally, 
and the steady solution sought for a wide range of the two governing parameters. 
This is a work in progress, in which numerical methods are proposed and developed 
for the problem. Implementation of the method and numerical results will appear in 
a later report. 


PROBLEM STATEMENT 


A half-space of a pure material is subjected to concentrated heating on the flat 
horizontal surface, giving a pool of molten material surrounded by solid. The total 
heat flux Q is constant, and far away the solid approaches the uniform cold tempera- 
ture T c (see Figure 1). Above the horizontal free surface is an inviscid, nonconducting 
gas. Surface tension of the liquid is assumed strong enough to keep the free surface 
flat (small Capillary number), but with surface tension variations due to a linear 
dependence on temperature. The resulting thermal and flow fields are assumed to 
be axisymmetric and steady, but the time-dependent equations are given below, to 


facilitate a numerical approach using time-like iterations to reach the steady solution. 

Then the system is governed by conservation of energy in the solid and by 
servation of energy, momentum, and mass in the pool: 

con- 

solid 

: § = * v2r 

(1) 

liquid 

dT 

: — + u-VT = kV 2 T 
ot 

(2) 


<9u 1 2 

— + u • V u = — V p + v V 2 u 
dt p 

( 3 ) 


V • u = 0 

( 4 ) 


with the conditions at the 

solid surface 
liquid surface 


boundaries and at the solid-liquid interface given by 


(* = 0 ) 
(* = 0 ) 


£-0 

oz 

, dT 
v = 0 


-q(r) 


( 5 ) 

(6) 

( 7 ) 
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solid 


T-^T C 


Figure 1 : Problem Formulation: a half-space of pure material is subjected to concen- 
trated surface heating Q that results in a molten pool. ( Outside the surface heating, 
the surface is adiabatic .) The melting temperature is T m , and far away the solid is at 
the cooler temperature T c . The flat liquid surface is subject to thermocapillary forcing, 
which drives convection in the liquid. Axisymmetry is assumed. 



du dT 

p-^- = - 7 -«- 

OZ OT 

(8) 

axis (r = 0) 

: £=0 
or 

( 9 ) 


u = 0 

(10) 


£=° 

or 

(ID 

far away (r, z — )■ 00) 

: T-*T C 

(12) 

interface (r = f(z,t)) 

II 

( 13 ) 


u = v = 0 

( 14 ) 


~(k VT)( = -{k VT) S + pLV(z , t) 

( 15 ) 


Here T is temperature, t is time, k is thermal diffusivity, u is the velocity vector with 
components u and v in the r and z directions (cylindrical coordinates), p is density, 
p is pressure, v is kinematic viscosity, k is thermal conductivity, q(r) is the imposed 
surface heat flux (large at r = 0, falling off to zero at some small value of r, such that 
fo° q(r) 2 irrdr = Q), p is viscosity, 7 (assumed constant and positive) is the negative 
of the derivative of the surface tension with respect to temperature, T m is the melting 
temperature, r = f(z,t) gives the position of the solid-liquid interface, L is the latent 
heat of fusion, and V (z, t ) is the normal velocity of the phase-change interface (that 
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IS, 


v ^ = iW 1 +( f )2fi 


where the unit normal vector is 






in terms of the coordinate unit vectors). 

To nondimensionalize the equations, we use a heat flux scale of Q and a tempera- 
ture scale (relative to the cold temperature) of AT = T m — T c . Then thermal conduc- 
tion gives the length scale d = Q/kAT (so q scales as Q/d 2 = (kAT) 2 /Q), the ther- 
mocapillary coupling gives the velocity scale u s = 7 AT/p, and the convection time 
scale is t c = d/u s = pQ/k^AT 2 . The viscous pressure scale is pu s /d = k^fAT 2 /Q. 
From the phase-change condition, the phase-change time scale is t p = pLQ 2 / (k AT) 3 . 

The resulting dimensionless equations are 


solid 

: Ma 

liquid 

: Ma 


Re | 


Vi 


dT _ 
dt 

(dT 

(du 
[ dt 


V 2 T 
+ u- VT 

+ U'Vu) 


= v 2 t 

— V p -(- V 2 u 


with the boundary conditions 

solid surface ( z = 0) 
liquid surface ( z = 0) 


axis (r = 0) 


far away (r, z — > 00) 
interface (r = f(z,t)) 


dT n 

dz 

dT 

& = -*> 
v = 0 
du_dT 

dz dr 
dT n 

dr 

u — 0 
dv 
dr 

T->0 
T = 1 


0 


u = v 


VT* = — VT, + AV 


(16) 

(17) 

(18) 

(19) 

(20) 

(21) 

( 22 ) 

(23) 

(24) 

(25) 

(26) 

(27) 

(28) 

(29) 

(30) 


where from this point on the variables denote the dimensionless quantities. The main 
dimensionless parameters are the Marangoni number Ma = u s d/K = ';Q / ji.kh; and 
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the Reynolds number Re = u s d/v. Their ratio gives the Prandtl number: Pr = 
u/k = Ma/Re. The other dimensionless parameter is the ratio of time scales, A = 
t p /t c = jQL/vk 2 AT, and so plays no role in the steady-state solution where V — >• 0. 

For the numerical solutions, it is convenient to eliminate the pressure by adopting 
a stream-function/vorticity formulation for the flow: 


Re 


doj 

~dt 


Vx(uxu) 


UJ 


-V x V x a; 

VxVxH 


u 


ld^_ 

r dz 


v 


IchF 
r dr 


(31) 

(32) 

(33) 


where T is the axisymmetric stream function and u is the vorticity vector (having 
only one component, in the 9 direction), with the flow boundary conditions 


liquid surface (z = 0) 

: ^ = 0 


(34) 


dT 




U dr 


(35) 

axis (r = 0) 

: ^ = 0 


(36) 


U! = 0 


(37) 

interface (r — f(z,t)) 

T d 
: ~ ~dr 

~lh~° 

(38) 


With the assumption of small Capillary number, the resulting small surface de- 
flection can be determined as a small perturbation to the flat interface from the 
dimensionless normal stress condition at the surface: 


~P + 



Ca~ l 


1 d_ 
r dr 



(39) 


where Ca = 7 A T/a is the Capillary number for surface tension <7, and the deflection 
z = hir) is taken positive upward. The contact line at the edge of the pool is assumed 
pinned (h — 0), and volume is conserved globally to determine the constant reference 
pressure level. 


CONDUCTION SOLUTIONS 


As a starting point for the numerical method, an analytic solution for the tem- 
perature in the conductive limit is used; this limit corresponds to Ma — x 0 (for which 
the time scale used in nondimensionalizing is inappropriate). If the unit surface heat 
input were concentrated at a single point, then the conductive solution would have 
spherical symmetry: 

r(r,z) = r^ , R = Vr 2 + z 2 (40) 
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For a distributed (axisymmetric) heat source q(r), the point source solution (40) can 
be used as a Green’s function, and the solution found by superposition: 


™ - n 

- L 


q(p ) pdd dp 


2ny/ p 2 + r 2 — 2 pr cos 6 + z 2 


q(p)p 


0 \J(p + r) 2 + z 2 


Apr 


(p + r)2 + ^ 2 


)dp 


(41) 

(42) 


where 2 Pi is the generalized hypergeometric function (see [1]). This formula can be 
used to find the temperature for any input heating distribution q, and the isotherm 
T = 1 specifies the interface position. 

Using this thermal solution with the interface position fixed, the flow equations 
(31)-(38) are solved numerically in the viscous limit Re — »• 0 (again, the time scale 
used is inappropriate in this limit). This gives the basic state, which has no fine 
details (except near the concentrated heating, where the flow can be described by an 
asymptotic solution [3]). This state is used as a starting point for solutions with low 
Ma and high Pr. 


NUMERICAL METHODS 


For computational purposes, the idealized problem of an unbounded solid is trun- 
cated to a finite domain in cylindrical coordinates, extending in both the radial and 
vertical directions a distance of four times the diffusion length scale d. The boundary 
condition on this artificial boundary is that the temperature should decay in the same 
way as the conduction solution for the point source, that is, 


dT__T 

dR~~R 


(43) 


where R = \Zr 2 + z 2 is the spherical coordinate. This asymptotic matching condition 
is reasonable (for several diffusion lengths away from the pool) and is far less restrictive 
than imposing the Dirichlet condition (T = 0) on the outer boundary. 

To calculate the steady state for various values of Ma and Pr, the time-dependent 
equations are stepped in time using the Crank-Nicholson method to obtain the ad- 
vantages of absolute stability and large time steps. Then at each time step, an elliptic 
problem must be solved. For this, multilevel methods are used, based on a uniform 
grid in the (r, z) quarter-plane and the Fast Adaptive Composite (FAC) grid approach 
to ensure resolution of all small-scale local details. At the solid-liquid interface, each 
grid has irregular elements to fit the interface. At each time step, the position of 
the interface is adjusted based on the normal velocity V from (30). (Note that the 
dimensionless parameter A in (30) can be adjusted to control how quickly the inter- 
face changes.) The difference equations on the grid are developed using the Finite 
Volume Element (FVE) method. This method combines the exact conservation of 
mass, momentum, and energy of the finite volume method with the flexibility of the 
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finite element method in handling complicated boundary conditions, irregular grids, 
etc. (See [5] for an introduction to FAC and FVE methods.) The resulting system 
of algebraic equations is solved at each time step. FAC is a method in which the 
solutions at the various grid levels are used to correct the composite grid solution, 
and the type of solver used on each grid level is unimportant. In this work both 
direct methods and iterative solution by line relaxation are used as solvers at each 
grid level. 


FVE STENCILS 


To recapitulate, the complete system of dimensionless equations is 


solid: 

liquid: 
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(44) 

(45) 

(46) 

(47) 

(48) 

(49) 

(50) 

(51) 

(52) 

(53) 

(54) 

(55) 

(56) 

(57) 
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(59) 


where n refers to the direction normal to the interface (outward) . (Note: / 0 °° q(r) r dr 

1 .) 


153 




0 1 2 N 


Figure 2: FVE Grid: the orientation of the triangular finite elements (solid) and the 
square finite volumes (dashed) are shown. On each triangular element, the variables 
are assumed linear between the three nodes. This allows consistent calculation of the 
gradients across the volume boundaries. Note that this is only a cross section in the 
(r, z) plane ; the volumes extend in the 6 direction to form rings. 

The Finite Volume Element (FVE) approach to discretizing the system involves 
decomposing the domain in two ways: as the union of a set of elements, whose 
vertices compose the set of grid points on which the unknowns are defined; and as 
the union of a set of control volumes, one for each grid point (see Figure 2). The 
unknowns are interpolated over each element, based on the values at the grid points, 
giving a continuous representation over the whole domain. This representation is 
used to integrate the conservation equations over each control volume. Hence, each 
control volume gives three equations involving the three unknowns at the associated 
grid point, as well as the values at neighboring points. The resulting set of discrete 
equations for the finite element representation of the solution satisfies the conservation 
laws exactly over any volume made up of the union of control volumes, including the 
whole domain. (Actually, the boundary conditions may eliminate some of the control 
volumes.) 
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control 

volume 



Figure 3: From axisymmetry, each control volume results from sweeping the square 
cross-section in the (r, z) plane about the z axis, giving a toroidal prism shape. Hence, 
the uniform grid gives control volumes that increase with radial position. 

For this axisymmetric problem, each control volume is a toroidal prism, the result 
of taking a polygonal cross-section in the (r, z) plane and sweeping it all the way 
around in the 9 direction (see Figure 3). Then, integrating the convection-diffusion 
equation (45) over a control volume, interchanging time derivatives and spatial inte- 
grals, and applying the divergence theorem gives 

It Ii Trdrdz + i & -^ rdl = Wa fv-VTrdl (60) 

where the 2ir resulting from integration in 9 has been factored out, A refers to the 
cross-sectional area (polygon) of the volume, C refers to the closed curve bounding 
that cross-section, and n is the unit vector normal (outward) to C. 

For the vorticity (46) and stream function equations (47), the control volume is 
a vorticity tube, and the appropriate integral is over the cross-sectional area A (with 
normal vector 9). Then, applying Stokes’ theorem gives 

cE J Ja U ® ~ ^ ' ( u x dl 

j f t • u dl 

where t is the unit vector tangent to C, in the positive 9 sense. 

Except near the phase-change interface, a uniform grid is applied with step size h 
in both the r and 2 directions (see Figure 2). (Portions of this grid may be subdivided 
into smaller uniform grids by the FAC method.) Each square of the grid is divided 


Re £*' 
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a; • 9 dr dz 


V x wdl 


(61) 

(62) 
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(k+lj+1) 



(kj+1) 


(k-lj-l) 


(k-lj) 


© cp 


Figure 4: The conservation integrals for each control volume cross-section involve six 
separate area integrals over the six triangular elements adjoining the central point; 
the line integrals involve eight separate parts (the NW and SE elements each contain 
two segments). 


into two triangular elements by a diagonal (in the direction of increasing r + z), and 
linear interpolation is used over each triangular element. The control volume cross 
sections are squares of side h , centered on each grid point (except for half-squares at 
the boundaries and small quarter-squares at the corners) . 

Then in the integrated conservation equations (60, 61, 62), the area integrals are 
over six triangular regions (portions of the six elements), and the line integrals are 
over four line segments, each with halves in two different elements (see Figure 4). In 
terms of components, the integrated equations are 
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(65) 


where from here onward, u> refers to the one nonzero component of vorticity, and 
the labels N, E, S, and IF refer to the four line segments of the line integrals by 
“compass direction” relative to the central node. 

Substituting the piecewise linear element representation of the unknowns into the 
above integrals gives the discrete (in space) equations. (We use Maple to evaluate and 
sum the integrals for these equations.) The equations are then presented in stencil 
notation. In stencil notation, for example 


( a 
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T, 

V 9 
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where the center of the matrix e is the coefficient of T at the gridpoint P , the other 
entries (a, b. . .) in the matrix are the coefficients of the values the unknown (T) at 
the neighboring gridpoints, and r and z are horizontal and vertical coordinates of P, 
respectively. Blank entries indicate zero coefficients, and a central S indicates the sum 
of all the other coefficients in the matrix. Note that in the nonlinear convective terms, 
each of the coefficients of T or co is itself expressed as a stencil in 4? (each centered at 
the same point P); to save space, the T is left out of the vorticity convection stencil. 

At a typical grid point, the discretized equations become 
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where the internal stencils 
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are employed and r is the radial coordinate at the central point P. Note that for 
those integrals in r with' integrands containing 1/r, that factor was pulled outside the 
integral to avoid logarithms; the error introduced is of the same order as that due 
to the piecewise linear representation itself. Also, the heat equation was rescaled by 
1/r, and the stream function equation was rescaled by r. 

The radial dependence of the coefficients is a direct result of the axisymmetric 
geometry. This dependence makes the calculation somewhat more complicated than 
the corresponding two-dimensional problem. But far from the axis, where r » h and 
hence e -C 1, the equations approach the corresponding two-dimensional versions, 
facilitating comparison. 
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Discretized Boundary Conditions 


Along the surface z = 0, each of the three boundary conditions for the three 
unknowns requires different treatment. The temperature at each grid point along 
the surface is determined by a heat balance over the corresponding control volume, 
with a half-square cross section ( h x h/2). The contribution of the surface to the 
convective flux integral is zero, since there is no velocity normal to the surface, and 
the contribution of the surface to the diffusive flux integral is given by the Neumann 
type boundary condition / q(r)rdr. The resulting discrete equation is 
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Here we specify the heat flux as a symmetric function of r that decays smoothly to 
zero at some finite radius p mQX , while satisfying / 0 °° q(r)rdr — 1: 


q(r) = { 



5 : Pmax 
T > Pmax 


(70) 


For the calculations, we use p max = 

The thermocapillary stress condition at the surface specifies the vorticity: u = 
However, because of the linear interpolation between grid points, ^ is not well defined 
at grid points on the surface. Hence, for the surface only, the vorticity is specified 
at half-grid points (i.e., r = (i + \)h), and triangular finite elements are formed 
with neighboring points. This keeps the discretization of this important condition at 
the same order of accuracy as the other equations, but entails special treatment of 
the grid points next to the surface. The surface is also a streamline, where = 0 
(Dirichlet condition). Using that fact and these special surface vorticity elements 
gives the following flow equations for points by the surface (a distance h from the 
surface) : 
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Along the z axis, symmetry requires that there is no heat flux across the axis, 
nor flow, nor shear stress, so both 41 and u are zero there. Then for points on the 
axis, the discrete heat balance over the cylindrical control volumes (half-square cross 
section h/2 x h) gives: 
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Ma 


T = 0 (73) 


where the equation was scaled using the average r = h/ 4. The homogeneous Pi rich let 
conditions on 4/ and to apply to points on the axis, and for grid points neighboring 
the axis, the usual stencils apply; no special treatment is necessary. 

The temperature at the grid point at the origin is determined by a small control 
volume (quarter-square cross section h/2 x h/2) with specified surface heat flux and 
no flux (nor convection) through the axis: 
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Again, at the origin, both T and to are zero (note the two boundary conditions on 
vorticity are consistent at this point, due to the symmetry). Hence, the usual surface 
flow equations apply to the grid point next to the origin. 

At the far boundaries of the computational domain, the boundary condition on 
the heat diffusion equation in the solid is that it decays in the same way as the 
spherically symmetric solution for a point source: 

„ dT ~ r „ z „ . . 

v = M k = ~r k = ~ t b^ t ~ t W z ^ 

where R = y/r 2 + z 2 . This allow the heat flux across the artificial boundary to 
be computed in terms of the temperature there, a Robin type boundary condition. 
Below we give the discrete equations for the two edges (half-square volumes) and 
three corners (quarter-square volumes) where this boundary condition is applied. 

At the edge where r is at its maximum the stencil is given by 
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where p = hr/(r 2 -I- z 2 ). 

At the edge where z is at its maximum the stencil is 
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At the corner where both r and z are at a maximum the stencil is 
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At the corner where r = 0 and z is at a maximum we have the stencil 
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Finally, at the corner where r is maximum and z = 0 the stencil is 
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Tracking the Phase-Change Interface 


One of the biggest challenges in models of phase change is the tracking over 
time of the position of the two-phase interface. As one of the main goals of the 
current research is the examination of the effects of thermo capillary convection on 
the interface shape, great care is necessary in accurately modeling the geometry and 
dynamics of the phase change process. 

The grid structure must be modified near the interface. (While it would be possible 
to quantize the interface position to lie on grid points, that would make moving the 
interface difficult and would introduce errors that would be magnified in the multilevel 
representation.) We represent the interface as piecewise linear between the points at 
which it crosses the diagonals of the main grid, which have slopes equal to 1. This 
representation assumes that the interface orientation never reaches an angle of 135° 
(or —45°) relative to the surface (i.e., parallel to the main diagonals); this seems 
reasonable, considering the interface is an isotherm that meets the surface at 90° and 
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must end at 0° at the axis. (A more general approach would include representations 
for several different local grid orientations.) 

The movement of the interface through melting or solidification is governed by 
the local heat balance near the interface. Hence the main requirement for the control 
volume around each interface point (along the diagonals) is that the volume contain 
the interface both at the current time and at the next time step, that is, the control 
volumes must allow room for movement. (Then for the next time step, new control 
volumes may be used.) Hence, not only the current interface position, but also an 
estimate of the future position, is required to construct the current local grid. An 
alternate approach is to adjust the solidification timescale parameter A at each time 
step to constrain the maximum motion of the interface to remain within the interface 
control volumes; physically this would correspond to time-dependent latent heat L. 

To keep the geometry as simple as possible while allowing the interface points to 
move along the main diagonals, we construct the control volumes on a diagonal grid. 
(Here we refer to the control volumes by their cross sections in the (r, z) plane.) The 
main diagonals are spaced a distance h/y/ 2 apart, and control volume boundaries in 
that direction lie midway between them. Control volume boundaries in the perpen- 
dicular direction are spaced the same, unless such boundary would cross the current 
or predicted interface, in which case that segment is removed, giving a double- wide 
volume (\/2 h x h/y/ 2). [Note: it is conceivable that, if the interface orientation 
exceeds 90°, triple- wide control volumes may be necessary.] Then any grid points 
within the interface control volumes are removed. If space remains between the inter- 
face control volume and the remaining regular grid, an auxiliary grid point is inserted 
on the diagonal a distance h/y/2 from the regular grid point, with its diagonal square 
control volume ( h/y/2 x h/yj 2). [Note: to simplify the programming, the auxiliary 
points could be omitted; then the control volumes for the interface points will be 
either single width (no grid point removed) or triple width (one grid point removed).] 
Then the control volumes for the regular grid points adjoining this diagonal grid are 
pentagons in one of three configurations: at an “inside” corner with one diagonal side, 
two regular sides, and two regular half-sides; at a straight edge, either horizontal or 
vertical, with one regular side, two regular half-sides, and two diagonal sides; or, at 
an “outside” corner with three diagonal sides and two regular half-sides. 

The auxiliary grid points form triangular elements with neighboring regular grid 
points and/or neighboring auxiliary grid points. This leaves trapezoidal elements 
adjoining the interface. Note that triangulating these trapezoids could result in very 
complicated relations between elements and volumes. Therefore we use a “warped” 
bilinear interpolation on these trapezoidal elements. 

Where the interface intersects the surface or the axis, the grid must be further 
modified to track these important points. This involves computing the heat balance 
on a diagonal surface (or axis) control volume and tracking the position of the interface 
along the diagonal. Depending on the proximity of the interface point on the diagonal 
to the surface (or axis) , then either the interface is extrapolated from the point inside 
the surface perpendicularly to the surface or the “interface point” on the diagonal 
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outside the surface is used to linearly interpolate the interface to the surface. 

The interface is defined as the isotherm where T = 1, and on the interface the fluid 
velocity is zero (no slip), and so ^ = 0. The unknown vorticity at interface points is 
determined by the stream function equation integrated over the liquid portion of the 
control volume; here the circulation can be calculated (with no contribution along 
the interface, due to no slip) to find the unknown strength of the vorticity tube: 

Jj bjdrdz — i-udl (76) 


where Ai is the liquid area, with bounding curve C\. (Note that this equation contains 
no time derivative.) 

The only remaining unknown is the future position (along the diagonal) of the 
interface point. This is governed by the heat balance over the liquid and solid portions 
of the control volume: 


A Ma 


d 


IL 


dt J JAi 


rdr dz 


— 7 - [ f Trdrdz 
dt J JAi+As 

— n • (uT) r dl 

J Ci 

+Ma _1 h-VTrdl 

JCi+Cs 


(77) 


where Ai + A s indicates the entire control volume, with bounding curve Ci + C s . 
(Note that Ai and Ci vary over the time step, while the control volume as a whole 
does not.) The interpretation is that the heat coming in by convection and diffusion 
goes to raise the temperature inside (though the interface temperature is fixed) and 
to melt some solid, increasing the liquid portion of the volume (the first term). 

The discrete equations are very complicated for regular grid points bordering 
the diagonal interface grid and for interface points, and so are not reproduced here. 
To guard against typographical errors, the stencils were derived using the symbolic 
mathematics capabilities of the Maple software ([6]). Maple converted the result- 
ing expressions into C language code, which were cut and pasted directly into our 
simulation code. 

The diagonal grid around the interface requires local diagonal coordinates. We 
call these (x,y), where 


x = z + r 
y = z — r 


Then the velocity becomes 


so that 


r = (x- y)/2 
z = (x + y )/ 2 


V2dV „ 

u= — x + 

r ay 


r dx ^ 


(78) 


(79) 
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Note that the ( x , y) coordinates are scaled down in length by a factor of V^2 relative 
to the (r, z) coordinates, and on the diagonal grid the values of the ( x , y) coordinates 
are integer multiples of h (rescaled). In the area integrals, the Jacobian gives dr dz = 
dxdy/2, but in each of the line integrals, the scaling of the differential is exactly 
compensated by the scaling of the derivative (with respect to x or y). The slight 
complication of rescaling is more than offset by the simplification of the algebra; 
otherwise factors of \[2 would abound. The bilinear interpolation for the trapezoidal 
elements is also in terms of the (x, y) coordinates, both to simplify integration with 
respect to the diagonal coordinates, and to avoid the singular case where the trapezoid 
is a diagonal perfect square, which cannot be interpolated with a bilinear form in (r, z ). 


Conclusion 

In this preliminary work we have developed a finite volume element method for deter- 
mining the shape of the weldpool. The governing equations and boundary conditions 
have been discretized in space, and a time-stepping method can be applied to solve 
the equations. An FAC method has been devised for resolving the fine details near 
the moving interface and is being implemented as part of the continuing research. 

The basic numerical methods discussed have been implemented in code and tested. 
A future report will describe the details of the time-stepping, the FAC resolution near 
the interface, and the numerical results on the total problem. 
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SUMMARY 


Fast methods are proposed for solving the system K^x = b resulting from the 
discretization of self-adjoint elliptic equations in three dimensional domains by the 
spectral element method. The domain is decomposed into hexahedral elements, and 
in each of these elements the discretization space is formed by polynomials of degree N 
in each variable. Gauss-Lobatto-Legendre (GLL) quadrature rules replace the integrals 
in the Galerkin formulation. This system is solved by the preconditioned conjugate 
gradients method. The conforming finite element space on the GLL mesh consisting of 
piecewise Q i elements produces a stiffness matrix Kh that is spectrally equivalent to 
the spectral element stiffness matrix Kpj. The action of the inverse of Kh is expensive 
for large problems, and is therefore replaced by a Schwarz preconditioner Bh of this 
finite element stiffness matrix. The preconditioned operator then becomes B^K^. 

The technical difficulties stem from the nonregularity of the mesh. Tools to esti- 
mate the convergence of a large class of new iterative substructuring and overlapping 
Schwarz preconditioners are developed. This technique also provides a new analysis 
for an iterative substructuring method proposed by Pavarino and Widlund for the 
spectral element discretization. 


INTRODUCTION 


In the past decade, many preconditioners have been developed for the large systems 
of linear equations arising from the finite element discretization of elliptic self-adjoint 
partial differential equations; see e.g. [5], [10], [21]. An especially challenging problem 
is the design of preconditioners for three dimensional problems. More recently, spec- 
tral element discretizations of such equations have been proposed, and their efficiency 
has been demonstrated; see [11], [12], and references therein. In large scale problems, 
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long range interactions of the basis elements produce quite dense and expensive fac- 
torizations of the stiffness matrix, and the use of direct methods is not economical due 
to the large memory requirements [9]. 

Early work on preconditioners for these equations was done by Pavarino [15], [16], 
[17]. Some of his algorithms are numerically scalable (i.e., the number of iterations 
is independent of the number of substructures) and optimal (the number of iterations 
does not grow or grows slowly with the degree of the polynomials). However, each 
application of the preconditioner can be very expensive. The bounds for the condition 
number of the preconditioned operator grow only slowly with the polynomial degrees, 
and are independent of the number of substructures. 

Following Pahl [13], who based his work on the work of Deville and Mund [6] 
and of Canuto [4], the above constructions give rise to different, spectrally equivalent 
preconditioners using block partitioning of the finite element matrix generated by 
Qx elements on the hexahedrals of the Gauss-Lobatto-Legendre (GLL) mesh. This 
observation and experiments for a model problem in two dimensions were made by 
Pahl [13], who demonstrated experimentally that this preconditioner is very efficient. 
Thus, high order accuracy can be combined with efficient and inexpensive low-order 
preconditioning. We remark that similar ideas also appear in [20] and references 
therein, and that the spectral equivalence results of Canuto [4] and generalizations for 
other boundary conditions were also obtained independently by Parter and Rothman 
[14]. 

The previous analysis of Schwarz preconditioners for piecewise linear finite ele- 
ments for the h-method has relied upon the shape regularity of the mesh [8], [7], [3], 
which clearly does not hold for the GLL meshes. We extend the analysis to such 
meshes, deriving estimates for these finite element preconditioners of spectral element 
methods. 

We give polylogarithmic bounds on the condition number of the preconditioned 
operators for iterative substructuring methods, as well as a new proof of one of the 
estimates in [18]. We remark that the tools developed here can be used to analyze 
overlapping Schwarz methods defined on the GLL mesh. 


DIFFERENTIAL AND DISCRETE MODEL PROBLEMS 


Let fi be a bounded polyhedral region in R 3 with diameter of order 1. We consider 
the following elliptic self-adjoint problem: 


a(u,v) = f(v ) V v € ffo(O), 


( 1 ) 


where 


a(u,v)= f k(x)Vu-Vvdx and f(v)= f fv dx for / € L 2 (fi). 

JQ, JQ, 
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This problem is discretized by the spectral element method (SEM); see [12]. 
Namely, we triangulate Cl into nonoverlapping substructures of diameter on 

the order of H. Each S2, is the image of the reference cube Cl = [— 1,+1] 3 under a 
mapping F, = £), o G,- where D, is an isotropic dilation and G, is a G°° mapping 
such that its derivative and the inverse of its derivative are uniformly bounded by a 
constant close to one. Moreover, we suppose that the intersection between the closure 
of two substructures is either empty, a vertex, a whole edge or a whole face. Each 
substructure 12,- is a distorted cube. We notice that some additional properties of the 
mappings F, are required to guarantee an optimal convergence rate. We refer to [2], 
problem 2 and the references therein for further details on this issue, but remark that 
affine mappings are covered by the available convergence theory for these methods. 
We assume for simplicity that k(x) has the constant value ki > 0 in the substructure 
Oi, with possibly large jumps occurring only across substructure boundaries. Our 
estimates for iterative substructuring algorithms are independent of these jumps. 

We define the space P N (Cl) as the space of Qn functions, i.e. polynomials of 
degree at most N in each of the variables separately. The space P N (Cli) is the space 
of functions vpj such that t>yv o F, belongs to P N (Cl). The conforming space P^ (Cl) C 
Hq(CI) is the space of continuous functions the restrictions of which to 0, belong to 
P N (Cli) for t = 1, ..., M. 

The discrete L 2 inner product is defined by 

i 

K N 

(«» v)n = J2 k '( uo F i) • ( v 0 Pi) • | £/) • pjpkpi , (2) 

t=l j,k,l= 1 

where and pj are, respectively, the Gauss-Lobatto-Legendre (GLL) quadrature 
points and weights in the interval [— 1,+1]; see [2]. 

The discrete problem is: find u/v € P^(Cl), such that 

vn) = (Vujv, Vuw)jv = /(vn) V vn € P^(Cl). (3) 

We choose as basis functions the functions of P^ (Cl) that are one at the GLL 
node j and zero at the other nodes, which gives rise in the standard way to the 
linear system K^x- = 6. Note that the mass matrix of this nodal basis generated 
by the discrete L 2 inner product (2) is diagonal. The analysis of the SEM method 
just described and experimental evidence show that it achieves very good accuracy 
for reasonably small N for a wide range of problems; see [2], [12], and references 
therein. The practical application of this approach for large scale problems, however, 
depends on fast and reliable solution methods for the system Kpjx = b. The condition 
number of is very large even for moderate values of N-, see [2]. Our approach is 
to solve this system by a preconditioned conjugate gradient algorithm. The following 
low-order discretization is used to define several preconditioners in the next sections. 
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The GLL points define a triangulation T h of Cl into parallelepipeds, and on this 
triangulation we define the space P h (Cl) of continuous piecewise trilinear (Qi) func- 
tions. The spaces P h (Qi) and P£ (fi) are defined analogously to P N (0,i) and P<^(fl). 
The finite element discrete problem associated with (1) is: find Uh € Pq{&) such that 

a(u h ,Vh) = f(v h ) Vv h € ff(fi). (4) 

The standard nodal basis {$}} in P h (Cl) is mapped by the F t -, 1 < i < M, into a basis 
for Pq(0,). This basis also gives rise to a system Khx = bin the standard way. 

We use the following notations: x •< y , z >z u, and u x to to express that there 
are positive constants C and c such that 

x < C y, z > cu and cw < v < C w, respectively. 

Here and elsewhere c and C are moderate constants independent of H and N . 

A. A 

Let h be the distance between the first two GLL points in the interval [—1, +1]; h 
is proportional to 1 /N 2 [2], and the sides hi, i = 1,2,3 of an element K belonging to 
T h satisfy 

1 (N 2 ^ hi ^ 1/N, 

A 

depending on the location of K inside O. The triangulation is therefore not shape 
regular. 

GENERAL SETUP AND SIMPLIFICATIONS 

Let um be a function belonging to P N (Cl), and let Uh — be the function of 

P k (il) for which 

Uh{xa) = u N (x G ), 
for every GLL point x G in Cl. Then 

l^^ljyi(n) ^ ^ un), (5) 

and 

ll^lli^) x ll“^llia(A) x ( un,un)n , (6) 

where Oq is given by (2) and (3) with J t - = 1; see [4] and [2]. We remark that these 
results and generalizations for other boundary conditions were obtained independently 
by Parter and Rothman [14]. The basis of these results is the H 1 stability of the 
interpolation operator at the GLL nodes for functions of f/ 1 ([— 1, +1]), proved by 
Bernardi and Maday [1], [2]. 

Consider now a function v defined in a substructure 0; with diameter of order H. 
Changing variables to the reference substructure by i>(x) = v(Fi(x)) and using simple 
estimates on the Jacobian of F{, we obtain 
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( 8 ) 


x H d 2 l“l/fi(n)’ 

where the dimension d is equal to 1, 2, or 3. 

These estimates can be interpreted as spectral equivalence of the stiffness and mass 
matrices generated by the norms and basis of the discrete spaces introduced above. 
Indeed, the nodal basis {4>j } is mapped by interpolation at the GLL nodes to a nodal 
basis of P N (Cl). Then, (5) can be written as 

u T K h u x u t K n u, (9) 

where u is the vector of nodal values of both un or Uh, and Kh and Ajv are the stiffness 
matrices corresponding to | • and <Xq(., .). 

Therefore, if and K$ are the stiffness matrices generated by the basis {</>£} 
and respectively, for all nodes j in the closure of ft; and by | ■ |#i( n .) and 

o«a(v)! then 

u t K^u x u t K$u, 

where u is the vector of nodal values, by (9), (8), and (5). The stiffness matrices Kpj 
and Kh are formed by subassembly [7], 

u T K h u = (10) 

for any nodal vector u, where the ub) are the subvectors of nodal values in ft;; an 
analogous expression holds for K^. These last two relations imply that 

u T K h u^u T K N u, (11) 

for any vector u. All these matrix equivalences and their analogues in terms of norms 
are hereafter called the FEM-SEM equivalence. 

We next show that the same reasoning applies to the Schur complements Sh and 
S N , i.e., the matrices obtained by eliminating the interior nodes of each ft; in a classical 
way; see [7]. Let un be Q-discrete (piecewise) harmonic if oq(un,vn) = 0, for all 
i and all vn belonging to P^(ft;). The definition of /i-discrete (piecewise) harmonic 
functions is analogous. It is easy to see that u t SnU. = gsq(uat, un) and that u r 5^u = 
a(uh, Uh), where «/, and u n are, respectively, Q and h-discrete harmonic and u is the 
vector of the nodal values on the interfaces of the substructures. 

The matrices Sh and Sn are spectrally equivalent. Indeed, by the subassembly 
equation (10), it is enough to verify the spectral equivalence for each substructure 
separately. For the substructure ft;, we find: 



(12) 


ik T S$u = a Qt ai(uN,u N ) h an,.(/^(«iv),/^(ujv)) > 

an i (W'h{lN u N),'Hh{lN u N )) = ani(uh,Uh) = u T S l h u , 

where 1^ is the interpolation at the nodes of 'Hh is the h-discrete harmonic ex- 
tension of the interface values, and the subscript 0, indicates the restriction of the 
bilinear form to this substructure. Here, we have used FEM-SEM equivalence and 
the well-known minimizing property of the discrete harmonic extension. The reverse 
inequality is obtained in an analogous way. 

In his Master’s thesis [13], Pahl proposed the use of easily invertible finite element 
preconditioners Bh and Sh,WB for Kh and Sh, respectively. If the condition number 
satisfies 

<B?K h ) < C(N) (13) 

with a moderately increasing function C(N), then a simple Rayleigh quotient argu- 
ment shows that k{B^ 1 Kn) di C'(AT), with an analogous bound for Sh,wb and Sn- 
Since the evaluation of the action of B^T 1 and 57 , wb is much cheaper, these are very 
efficient preconditioners. 

Therefore, we only need to establish (13) and its analogue for Sh and 57 *vb- We 
note that the triangulation %, is nonregular, and that all the bounds of this form 
for Schwarz preconditioners established in the literature require some kind of inverse 
condition or regularity of the triangulation, which does not hold for the GLL mesh. In 
this paper we only analyze the iterative substructuring algorithms, but remark that the 
analysis for overlapping methods is a straightforward consequence of our techniques. 

SOME ESTIMATES FOR NONREGULAR TRIANGULATIONS 

We state here all the estimates necessary to extend the technical tools developed in 
[7] to the case of nonregular hexahedral triangulations. We let K = [— 1,+1] 3 be 
the reference element and K be its image under an affine mapping F. K C Cl is an 
element of the triangulation T~ h with sides h\ , /i 2 and h^. The function u is a piecewise 
trilinear ( Q \ ) function defined in K . Notice that in this subsection we use hats to 
represent functions and points of K. 

The first result concerns the expressions of the L 2 and H 1 norms in terms of the 
nodal values. Let e,- be one of the coordinate directions of K , and let a, 6, c and d be 
the nodes on one of the faces that is perpendicular to e,-, and let a', 9, etc. be the 
corresponding points on the parallel face. The notation x a denotes a generic node of 
K, and a, a 1 , are the images of a, and a', etc. The next lemma follows by changing 
variables, and by using the equivalence of any pair of norms in the finite dimensional 
space Qi{K). 

Lemma 1. 

IMI b(K) X ^1^3 2(«(*a)) 2 (14) 
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(15) 


\\d*M\h(i <) « h t h3 £ (“(*•) ~ “Wf 

* x a =a,b,c,d 

In the next lemma we give a bound on the gradient of a trilinear function in 
terms of bounds on the difference of the values at the nodes (vertices). The proof is 
elementary and is omitted. 

Lemma 2. Let u be trilinear in the element K such that |tt(a) — u(f>)| < C\a — b\jr 
for some constants C and r, and for any two vertices a and b of the element K. Then 

|Vu| < -. 

r 


Lemma 3. Let u be a trilinear function defined in K , and let i) be a C 1 function 
such that |Vi?| < C/r and \d\<C for some constants C and r. Then 

\^xJ h {du)\ 2 L 2^ K ) < C'(|w|flri(ic) + r 2 |MIl2(k-))- (16) 

Here C is independent of all the parameters, and I h is the interpolation to a Q\ 
function of the values in the vertices of K. 

Proof. By equation (15), and letting hi, h 2 , and h 3 be the sides of the element K: 
\\d Xi I h {du)\\ 2 L2{k) * {u(x)d F k(x) - u{x')d F k(x')) 2 . 

x~a } b t c,d 

Each term in the sum above can be bounded by 

(u(x)t9(x) — u(x)i9(x') + u(x)tf(x') — u(x')i9(x')) 2 < 

2 {(u(x)) 2 (tf(x) - d(x')) 2 + (u(x) - «(* / )) 3 (^(* / )) 3 ) • 

The bound on Vi9 implies that |?9(x) — $(x')| < Chi/r , and therefore 

^ J2 (u(x) - u(x')) 2 + ( u ( x 

i x=a,b,c,d x=a,b,c,d 

^ Mtfi(tf)+»’“ 3 |M|£ 3( *), ( 17 ) 


□ 



WdxjH^wi^ 

since i? is bounded. 
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TECHNICAL TOOLS 


We introduce notations related to certain geometrical objects, since the iterative 
substructuring algorithms are based on subspaces directly related to the interiors of the 
substructures, the faces, edges and vertices. Let be the union of two substructures 
fi,- and Oj, which share a common face, Tk- Let Wj represent the wirebasket of the 
subdomain ftj, which is the union of all the edges and vertices of this subdomain. We 
note that a face in the interior of the region fl is common to exactly two substructures, 
an interior edge is shared by more than two, and an interior vertex is common to still 
more substructures. All the substructures, faces, and edges are regarded as open sets. 

The preconditioner Sh,wb that we use is defined by subassembly of the matrices 
•S'i'.tvB . Therefore we can restrict our analysis to one substructure. The results for the 
whole domain follow by a standard Rayleigh quotient argument. It is also enough to 
estimate the preconditioning of Sh by Sh,WB , because these results can be translated 
into results for each substructure by the equivalences (5), (7), and (8). 

The assumption that the are arbitrary smooth mappings improves the 

flexibility of the triangulation, but does not make the situation essentially different 
from the case of affine mappings. Therefore, without loss of generality, we assume, 
from now on, that the F, are affine mappings. 

In some of the following results, we state the result for substructures of diameter 
proportional to H, but prove the theorem only for a reference substructure. The 
introduction of the scaling factors into the final formulas is routine. 

Lemma 4. Let Uyy. be the average value of u h on Wj, the wirebasket of the subdo- 
main fij. Then 

II u/i |Il 2 (ho) ^ C(1 + l°g(A r ))||u /l ||| f i( fi> ), 

and 

||u fc — Uwj\\L 2 (wi) ^ C(1 + l°S(-^))l u/l |/f 1 (n i )- 
Similar bounds also hold for an individual substructure edge. 

Proof. In the reference substructure, we know that P h C V h , where V h is a 
standard Q i finite element space defined on a shape regular triangulation that includes 
T h . This can be done by refining appropriately all the elements of T h with sides larger 
than, for example, 3fi/2. 

Now we apply the well-known result for shape regular triangulations, lemma 4.3 
in [7], to get both estimates, recalling, that in the reference substructure h x 1/A 2 . 


□ 

In the abstract Schwarz convergence theory, the crucial point in the estimate of 
the rate of convergence of the algorithm is to demonstrate that all functions in the 
finite element space can be decomposed into components belonging to the subspaces 
in such a way that the sum of the resulting energies is uniformly, or almost uniformly, 
bounded with respect to the parameters H and N. The main technique for deriving 
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such a decomposition is the use of a suitable partition of unity. In the next two 
lemmas, we explicitly construct such a partition. 

Lemma 5. Let Tk be the common face between Oj and Oj, and let 0j? k be the func- 
tion in P h (fl) that is equal to one at the interior nodes of Tk, zero on the remainder 
of (dtii U dflj), and discrete harmonic in 0, and Llj. Then 

lOrfmia^Cil+logiN))!!. 

The same bound also holds for the other subregion flj. 

Proof. We define the functions 6^ k and in the reference cube; 6yr k and dr k 
are obtained, as usual, by mapping. We construct a function tij: k having the same 
boundary values as , and then prove the bound for the former. The standard energy 
minimizing property of discrete harmonic extensions then implies the bound for 0? k - 
The six functions which correspond to the six faces of the cube also form a partition 
of unity at all nodes at the closure of the substructure except those on the wirebasket; 
this property is used in the next lemma. 

We divide the substructure into twenty-four tetrahedra by connecting its center 
C to all the vertices and to all the six centers Ck of the faces, and by drawing the 
diagonals of the faces of 0; see Fig 1. 

The function ‘d^ k associated with the face Tk is defined as being 1 /6 at the point 
C . The values at the centers of the faces are defined by 0F k {Cj) = Sjk, where 8jk is 
the Kronecker symbol. d^ k is defined to be linear on the segments CCj for j = 1, ..., 6. 
The values inside each subtetrahedron defined by a segment CCj and one edge of the 
cube are defined to be constant on the intersection of any plane through that edge and 
are given by the value, already known, at the segment CCj. The values at the edge of 
the cube belonging to this sub tetrahedron are then modified to be equal to zero. Next, 
the whole function $? k is modified to be a piecewise Qi function by interpolating at 
the vertices of all the GLL nodes of the reference cube. 

We claim that |Vi9^, (x)| < C/r, where a; is a point belonging to any element K 
that does not touch any edge of the cube, and r is the distance between the center of 
K and the closest edge of the cube. Let ab be a side of K. We analyze in detail the 
situation depicted in Fig. 2, where ab is parallel to CCk . Let e be the intersection of 
the plane containing these two segments with the edge of the cube that is closest to 
ab. Then \$F k {b) — 0jr k (a)\ ■< D, by construction of i)jr k , where D is the size of the 
radial projection of ab on CCk • By similarity of triangles, we may write: 

!«*(») - (18) 

where r' is the distance between e and the midpoint of ab. Here we have used that the 
distance between e and CCk is of order 1. If the segment ab is not parallel to CCk, 
the difference |d^(6) — d^(a)| is even smaller, and (18) is still valid. Notice that r' 
is within a multiple of 2 of r. Therefore Lemma 2 implies that |Vd^(a:)| < C/r. 


175 




Figure 1: One of the segments CCk 

In order to estimate the energy of iV fc , we start with the elements K that touch 
one of the edges of the face Tk- Let h 3 be the largest side of one of these elements. 
Since the nodal values of ■d? k at K are 0, 1, and 1/6, 

I ti? k \H l (K) < Ch 3 , 

by a simple use of equation (15). By summing over K, we conclude that the energy 
of is bounded independently of N for the union of all elements that touch one of 
the edges of the face Tk- 

To estimate the contribution to the energy from the rest of the substructure, we 
consider one subtetrahedron at a time and introduce cylindrical coordinates using the 
substructure edge, that belongs to the subtetrahedron, as the z-axis. The bound now 
follows from the bound on the gradient given above and from elementary considera- 
tions. We refer to [7] for more details. 

□ 

The following lemma corresponds to Lemma 4.5 in [7]. This lemma and the pre- 
vious one are the keys for avoiding H^q 2 estimates and extension theorems. 

Lemma 6. Let d? k (x) be the function introduced in the proof of Lemma 5, let Tk be 
a face of the substructure f lj, and let I h denote the interpolation operator associated 
with the finite element space P h and the image of the GLL points under the mapping 
Fj. Then, 

k 



Figure 2: Geometry underlying equation (18) 
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for all nodal points x € flj that do not belong to the wirebasket Wj, and 

\I h (dr k u)\ 2 H i( aj ) < C(1 + l°g(A r )) 2 ||«||Hi(n J )- 

Proof. The first part is trivial from the construction of djr k made in the previous 
lemma. For the second part, we first estimate the sum of the energy of all the elements 
K that touch the wirebasket. The nodal values of the interpolator I h (dyr k u h ) in such 
an element are 0,0, 0,0, u(a), u(b), djr k (c)u(c) and djr k (d)u(d); dyr k lies between 0 and 
1. Moreover, we denote by h 3 the side of K that is larger than the other two sides 
hi and /i 2 = h\. Note that this larger side is parallel to the closest wirebasket edge. 
Since hi < fi 3 , and using equation (15), we obtain: 

< Ch 3 {u\a) + u 2 (b) + (dr k (c)u{c)) 2 + (^ k (d)u(d)) 2 ). 

Then, by using the expression of the L 2 -norm in the two segments that are parallel to 
the edge, and lemma 4, we have: 

DMv6)Ih W < CO + log(A0)IHI H 1 («,•)> 

K 

where the sum is taken over all elements K that touch the boundary of the face Tk- 

We next bound the energy of the interpolant for the other elements. Since | ViV* | < 
C/r, where r is the distance between the element K and the nearest edge of Cl (see 
the proof of the previous lemma), Lemma 3 implies that 

5Z W(K) < C (l“l HHK) + r~ 2 \\u\\l 2 (K) ), 

KcCl Ken 

where the sum is taken over all elements K that do not touch the edges of Cl. 

The bound of the first term in the sum is trivial. To bound the second term, we 
partition the elements of Cl into groups, in accordance with the closest edge of Cl; 
the exact rule for the assignment of the elements that are halfway between is of no 
importance. For each edge of the wirebasket, we use a local cylindrical coordinate 
system with the 2 axis coinciding with the edge, and the radial direction, r, normal to 
the edge. In cylindrical coordinates, we estimate the sum by an integral 

5Z r ~ 2 \\u\\h(K) <C f f f {uf^drdOdz. 

a Jr=h JQ Jz V 

KC& 

The integral with respect to z can be bounded by using Lemma 4. We obtain 

J2 r ~ 2 IMI h(K) < C{l + \og{Clh))\\^\ 2 Hl{tl) J^r~ l dr 
Ken r 

and thus 
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E |/‘(*r.«)gp ( jr> < C(1 +log(CA)) J ||i||J rl(fl) . 

Ken 

□ 

Lemma 7. Let Ugyr k , and u^ k be the averages of u h on dTk, and W k , respectively. 
Then, 

(“If*) 2 ^ C—\\u h \\L2( d:Fk ), 

(u^fc ) 2 < C—\\u h \\tf( W ky 

The proofs are direct consequences of the Cauchy-Schwarz inequality. 

Lemma 8. Let u h be zero on the mesh points of the faces of flj and discrete 
harmonic in flj. Then 

This result follows by estimating the energy norm of the zero extension of the 
boundary values by means of equation (15) and by noting that the harmonic extension 
has a smaller energy. 

ITERATIVE SUBSTRUCTURING ALGORITHMS 

The first algorithm we analyze is a wirebasket based method, based on Algorithm 6.4 
in [7]. This is a block-diagonal preconditioner after transforming the original matrix 
to a convenient basis. 

To use the abstract framework of Schwarz methods [7], we only need to prescribe 
spaces whose union is the whole space, and the corresponding bilinear forms. 

Each internal face Tk generates a local space Vyr k of all the h-discrete harmonic 
functions that are zero at all the interface nodes that do not belong to this face. Notice 
that the functions belonging to Vp k have support in the union of the two substructures 
f U and Qj that share the face Tk- The bilinear form used for this space is a(-, •). 

We also define a wirebasket subspace that is the range of the following interpolation 
operator: 

Iw uh = uh { x k)Tk + 53 U^pkdpk. 

XkGWh k 

Here, <fk is the discrete harmonic extension of the standard nodal basis functions fa, 
Wh is the set of nodes in the union of all the wirebaskets, and u^ Fk is the average of 
u h on dF k . The bilinear form for this coarse subspace is given by 
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b 0 (u,u) = (1 +log(iV))^fciinf ||u-Ci||^ (vv .). 

i 


These subspaces and bilinear forms define, via the Schwarz framework, a precon- 
ditioner of Sh that we call Sh,wb- 

Theorem 1. For the preconditioner Sh,WB> we have 

k(S^ wb S n ) < C( 1 + log(N)) 2 , 

where the constant C is independent of the N, H, and the values k, of the coefficient. 

Proof. We apply, word by word, the proof of theorem 6.4 in [7] to the matrix Sh, 
using the tools developed in the previous section. This gives 

<S$r B S h ) < C(1 + log(N)) 2 . 

The harmonic FEM-SEM equivalence (12) and a Rayleigh quotient argument complete 
the proof. 


□ 

We do not give the complete proof here because it would be a mere restatement 
of the proof in [7]. 

The next algorithm is obtained from the previous one by the discrete harmonic 
FEM-SEM equivalence, by which we find a preconditioner Sn,wb from the precondi- 
tioner Sh,WB studied above. Each face subspace, related to a face Tk, is composed 
of the set of all Q-discrete harmonic functions that are zero at all the interface nodes 
that do not belong to the interior of the face J~k- 

The wirebasket subspaces are defined as before, by prescribing the values at the 
GLL nodes on a face to be equal to the average of the function on the boundary of 
the face. The bilinear forms used for the face and wirebasket subspaces are ciq(-,-) 
and 6 0 (-, •), respectively. Notice that this is the wirebasket method based on GLL 
quadrature given in [19]. 

The following lemma shows the equivalence of the two functions un and Uh with 
respect to the bilinear form &o( - , ■)• 

Lemma 9. Let Uh be a Q i finite element function on the GLL mesh of the interval 
I = [— 1,+1], and let u.v be its polynomial interpolant. Then 

inf ||uft. — c| 1 ^ 2 ( 7 ) x inf ||ujv — c| lz, 2 (J) 

Proof. We prove only the < part. The inequality without the infimum is valid for the 
constant c r that realizes the inf in the right hand side by the FEM-SEM equivalence. 
By taking the inf in the left hand side we preserve the inequality. 
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□ 


Theorem 2. For the preconditioner Sn,wb, we have 

k (SnIwbSn) < C( 1 + log(iV)) 1 2 

where the constant is independent of the parameters H, N and the values fc,- of the 
coefficient. Proof. In this proof, the functions with indices h and N are all discrete 
harmonic functions with respect to the appropriate norms, related in the same way as 
un and Uh, i.e. Uh = According to equation (10), it is enough to analyze 

one substructure 0,- at a time, and prove the following equivalence: 


bo,Wi(uN,UN) + \ U M - u N,dF k 0N,F k \ 2 Hi(ni) X (19) 

FkC&i 

bo ,wAuh, u h ) + J2 K - Uh,dr k 0h,r k 


We prove only the < part; the proof of the other inequality is analogous. Lemma 
9 gives an upper bound of the first term of the left hand side by the corresponding 
term in the right hand side. 

Each term in the sum on the left hand side can be bounded by 

2\U N - U htd yr k d Nt yr k \ 2 HHni) + 2\(u h ,d? k ~ U N , d yr k (n .)^ . 

The first term of this expression can be bounded by the corresponding term on the 
right hand side by interpolation and the harmonic FEM-SEM equivalence. The second 
term is bounded by 

H( 1 + log(A/'))|u* i 9^ fe — UN,d? k \ 2 = 

H( 1 + l°g(A0)|(u — c h,Wi)h t dr k ~ ( u ~ c A,Wi )jv,a^ fc I ) 

where c/^w, is the average of Uh over W,-. Here we have used the estimate on the 
energy norm of &h,T k which implies a similar estimate for 0N,F k - Applying the Cauchy- 
Schwaxz inequality, as in lemma 7, and the FEM-SEM equivalence, we can bound this 
last expression in terms of the first term in the right hand side of equation (19). 


□ 
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Abstract 

Multigrid algorithms for nonconforming and mixed finite element methods 
for second order elliptic problems on triangular and rectangular finite elements 
are considered. The construction of several coarse-to-fine intergrid transfer 
operators for nonconforming multigrid algorithms is discussed. The equivalence 
between the nonconforming and mixed finite element methods with and without 
projection of the coefficient of the differential problems into finite element spaces 
is described. 


INTRODUCTION 


In this paper we consider multigrid algorithms for numerical solution of the model 
problem 


— V • (aVu) = / in 0, 
u = 0 on dtt, 


( 1 . 1 ) 
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using nonconforming and mixed finite element methods, where f] C IR n , n = 2, 3 is a 
simply connected bounded polygonal domain with the boundary <90, / £ P 2 (0), and 
the coefficient a £ P°°(0) satisfies 

0 < Gti < a(x ) < a 2 , x £ 0 , (1.2) 

with fixed constants ai, a 2 . The W-cycle multigrid algorithm for numerically solving 
(1.1) using the Pi -nonconforming finite element method over triangles has been ex- 
tensively studied in [6, 9, 15]. It has been shown that the W-cycle algorithm with 
a particular coarse-to-fine intergrid transfer operator (the so-called averaging oper- 
ator) is convergent under the assumption that the number of smoothing iterations 
on all levels is big enough. In [18] a convergence analysis for multigrid algorithms 
for (1.1) on triangular and rectangular elements, based on the abstract theory in [8] 
for multigrid methods with nonnested spaces, has been introduced. This analysis 
applies to both the W-cycle and the variable V-cycle. It was shown in [18] that opti- 
mal convergence properties of the W-cycle multigrid algorithm and uniform condition 
number estimates for the variable V-cycle preconditioner can be established with the 
averaging intergrid transfer operator. 

In this paper the V-cycle and W-cycle multigrid algorithms for numerically solving 
(1.1) using the nonconforming finite element method both over triangles and rectan- 
gles are considered in detail. Special attention is paid to the construction of several 
coarse-to-fine intergrid transfer operators for the nonconforming multigrid algorithms. 
In particular, we introduce a new intergrid transfer operator and indicate the conver- 
gence of the V-cycle algorithm, which has not been proved before. Our preliminary 
results show that a similar operator also works for the biharmonic problem. 

The multigrid algorithms for mixed finite element methods are also considered 
here. The mixed methods require the solution of linear systems in the form of a 
saddle point problem, which can be expensive to solve. An alternate approach was 
suggested by means of a nonmixed formulation. Namely, it has been shown that the 
mixed methods are equivalent to a modification of nonconforming Galerkin methods 
[1, 2, 14, 27]. The modified nonconforming methods yield a symmetric and positive 
definite problem (i.e., a minimization problem). However, various bubble functions 
have been used to establish the equivalence between the two methods, which can be 
again expensive from the computational point of view. In [15] a new approach has 
been introduced to establish the equivalence between the mixed and nonconforming 
methods without using the bubble functions. The projection of the coefficient a of the 
differential equation (1.1) into finite element spaces has been incorporated into the 
mixed formulation. Recently, we have been asked if the equivalence still holds without 
the coefficient projection. A positive answer will be given in this paper. In particular, 
a comparison between the usual and projected mixed methods is given, and we show 
that the latter version gives us considerable computational savings, without any loss 
of accuracy, as observed before [20]. 

The remainder of the paper is organized as follows. In the next section multigrid 
algorithms for the Pi -nonconforming method over triangles are developed. Then in 
the third section multigrid algorithms for triangular mixed methods are considered. 
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An extension to the corresponding rectangular elements is carried out in the fourth 
section. Finally, numerical experiments on the performance of the present approaches 
are given in the fifth section. The later analysis is carried out for the two-dimensional 
case; it works for the three-dimensional case without substantial changes [15, 17, 16]. 

NONCONFORMING MULTIGRID ALGORITHMS 

Problem (1.1) is recast in weak form as follows. We define the bilinear form a(-, •) 
as follows: 

a(v, w) — (aVu, Vw), v,w G i/ 1 (0) , 

where (-, ■) denotes the L 2 (fl) or (L 2 (0)) 2 inner product, as appropriate. Then the 
weak form of (1.1) for the solution u G Hq(H) is 

a(u,v) = (f,v), VuGtfo 1 ^). (2.1) 

For 0 < h < 1, let £h be a triangulation of f I into triangles of size h and define 
the Pi -nonconforming finite element space 

14 = {u G P 2 (D) : v\e is linear for all E G £h,v is continuous 
at the midpoints of interior edges and 
vanishes at the midpoints of edges on 90}. 

Associated with 1 4, we introduce a bilinear form on 14 ® Hi (0) by 

a h (v,w)= ^(aVr,V«))E, v, w G 14 ® Hl(Q), 

E££h 

where (•, -)e is the L 2 {E) inner product. Then the Pi -nonconforming finite element 
discretization of (1.1) is to find Uh G 14 such that 

a h (uh,v) = (/,v), V v G V h . (2.2) 

After we use a set of bases in 14, (2.2) leads to the following linear system: 

AhUh — Fh, (2.3) 

where Ah is symmetric and positive definite. 

To develop a multigrid algorithm for (2.1), we need to assume a structure to our 
family of partitions. Let h 0 and £h 0 — So be given. For each integer 1 < k < K, let 
hk = 2 ~ k h 0 and Sh k = £k be constructed by connecting the midpoints of the edges 
of the triangle in £k-i, and let £h = £k be the finest grid. We replace subscript hk 
simply by subscript k. 

Let Ik— i '■ 14 - 1 14 denote some as yet unspecified coarse-to-fine intergrid 

transfer operator. By an abuse of notation, we also denote by I^_ x the matrix of this 
operator with respect to the bases {V’i - 1 , • • • , °f 14— l and {ipi , . . . 
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Vk, and l£ 1 : Vjt — > 14 - 1 the transpose of /£_i . Finally, let u)k indicate a parameter, 
which is chosen to be not smaller than the largest eigenvalue of A*. 

We now formulate our multigrid algorithm for (2.3). The following algorithm 
defines a multigrid operator Bk : Vk — > Vk. 

Multigrid Algorithm 2.1. Let 1 < k < K , and y be a positive integer. Set 
Bo = Aq 1 . Assume that Bk - 1 has been defined and define Bkg for g G Vk as follows: 


1. Set x° = 0 and q° = 0. 

2. Define x l for 1 = 1,..., m(k) by 

x l = x l ~ l + u^(g - A k x l ~ l ). 

3. Define = x m ^ + l£_ x q p , where q t for i = 1, . . . ,y is defined by 


= <T X + Bk-i [it 1 {g ~ A k x m <*>) - A, 


k-iq 


4. Define y l for l = m{k ) + 1, . . . , 2 m(k) by 

y' = s /' _1 + ^ (g - . 


(2.4) 


5. Set Bkg = y 2m ( k \ 

In Algorithm 2.1, m{k ) gives the number of smoothing iterations and can vary 
as a function of A;. If g = 1, we have a V-cycle multigrid algorithm. If g = 2, we 
have a W-cycle algorithm. A variable V-cycle algorithm is one in which the number of 
smoothings m(k ) increase exponentially as k decreases (i.e., y = 1 and m(k) = 2 K ~ k ). 

We now consider the problem of how to construct a coarse-to-fine intergrid transfer 
operator I k -v We first review three known operators, and then introduce a new one. 

EXAMPLE 1. The first operator is the so-called averaging operator, which was 
first defined in [6] and [9]. For v 6 14- 1 , let q be a midpoint of an edge of a triangle 
in Sk\ then we define hv by 

f 0 if q G dft, 

{ I k-i v ) (?) = •{ v i<l) if qgdE for any E G £ k -i, 

[ + v \E 2 (q)} iiqedE 1 ndE 2 ioTsomeE 1 ,E 2 e£k-i. 

With this operator, as mentioned in the introduction, it has been first shown in [6, 
9] that the W-cycle algorithm (i.e., y = 2) is convergent under the assumption that 
the number of smoothing iterations on all levels is big enough (following the standard 
proof of convergence for conforming methods [4, 3]). Then in [18] a convergence 
analysis for Algorithm 2.1 was given, which establishes optimal convergence properties 
of the W-cycle multigrid algorithm and uniform condition number estimates for the 
variable V-cycle preconditioner; see the theorem below. Since this operator does not 
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preserve the energy norm, the standard proof of convergence given in [5, 8] for the 
conforming finite elements does not work for the nonconforming V-cycle. In fact, 
while we can establish the stability property [6, 9, 15] 

a k {hv,Ikv) <Ca k -i(v,v), V v £ I4-i, (2.5) 

with C independent on k , the constant C is in general bigger than two, as observed 
in [18]. 

EXAMPLE 2. The second example was originally described in [33], and then 
used in [17] for analyzing domain decomposition methods for mixed finite element 
methods. If v £ V k -\ and E £ £ k -\ with the vertices qi and the midpoints <? t - of its 
edges, i = 1,2,3, then 

* = 1,2,3, 

Ej v(%) if Qi $ dil, 

I k -A<li)=]fc Ej vtfj) if ® G dfy 

where Mi and A4 are the number of the adjacent midpoints q'- and q 1 ' to qi of the 
interior edges and the boundary edges of the elements in £ k -i, respectively. 

EXAMPLE 3. The third example [17, 21] is very similar to that in Example 2. 
If v £ I4-i and E £ £ k -i with the vertices qi and the midpoints q l of its edges, 
* = 1,2, 3, then 

* = 1,2,3, 

E 9i6 k 3 v\ K] (qi ) ^ « i dn > 

iLA<u)=j^ Ej vtfj) if qi £ on, 

where Mi is the number of elements Kj £ £ k -i that meet at qi and M 2 is defined as 
in Example 2. 

Note that Examples 2 and 3 define the value of l [ '_ x v at the vertices of elements 
in £ k and thus lead to a continuous piecewise linear function on £ k . Hence I k _ x v 
is obviously in V k . Also, since the operators in Examples 2 and 3 do not preserve 
the energy norm, we can only establish the optimal convergence properties of the W- 
cycle multigrid algorithm and the uniform condition number estimates for the variable 
V-cycle preconditioner via the standard convergence proof [4, 8], as in Example 1. 

With the three definitions above, we now state a convergence result, whose proof 
is given in [18]. 

The convergence rate for Algorithm 2.1 on the fcth level is measured by a conver- 
gence factor 8 k that satisfy 

| a* (( I ~ B k A k )v , u)| < 8 k a k (v, v), V v £ V k . 
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Theorem, (i) Define Bk by p — 2 for all k in Algorithm 2.1. Then there exists 
C > 0 independent of k such that for a large enough m 


fa < s — 


c 

C + yjrn 


(ii) Define Bk by p = 1 and m{k ) = 2 K k for k = 1, . . . , K in Algorithm 2.1. Then 
there are r] 0 , rj * > 0, independent ofk, such that 


r} 0 a k (v , v) < a k {BkA k v, v ) < r)ia k (v, v), \/v £ V k , 


U/2 


with rjo > d +%m and m < C ^% • 

EXAMPLE 4. We now define the operator 

a k{I$- i v , w ) = a k {v,w), 


Iti ■ H-i Vk by 
Wv € V k -i,w e v k . 


(2.6) 


With this definition, the inequality (2.5) is trivially satisfied with C = 1. Hence the 
abstract theory in [8] can be applied to show convergence of both the V-cycle and the 
W-cycle algorithms with one smoothing iteration. However, the I^_ 1 in (2.6) is not 
practical. The cost to obtain Ik_iV for v £ Vk-\ is almost the same as that to solve 
the original linear system. The problem is that /|Li u cannot be explicitly determined. 
To get around this obstacle, we now consider the operator /£ -1 : 14 — > 14 - 1 defined 

by 

a k -i{ll~ l v,w) = a k {v,ll_ x w), Vu £ 14, w £ 14— i- (2.7) 

The operator /| -1 can be explicitly determined by the simple relation (the proof will 
be presented in a forthcoming paper) 

( / fc“ 1 ^)(9 1) = \( v (u) + "tea)), 

for v £ 14 (see Figure 1). With use of the operator /£ -1 and its transpose I^_ x 
in Algorithm 2.1, we can prove convergence of both the V-cycle and the W-cycle 
algorithms. This will be given in the forthcoming paper. We remark that the same 
construction of the operator /£ -1 can be carried out for Morley’s elements for the 
biharmonic problem. 


MULTIGRID ALGORITHMS FOR MIXED METHODS 


The Raviart-Thomas space [31] over triangles is given by 

Ah = {u £ (L 2 (fl)) 2 : v\ E = {a\ + a 2 E x, a% + a%y) , a l E £ IR, E £ £ h } , 
Wh = {w £ !fi 2 (H) : w\e is constant for all E £ Eh} , 

Lh = {p £ L 2 (d£h ) : p\ e is constant, e £ d£h\ p\ e = 0, e C 90} , 
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FIGURE 1. The illustration of the definition of /£ h 


where d£h denotes the set of all interior edges. Then the hybrid form of the mixed 
method for (1.1) is to seek (ah,Uh, A^) € A ^ x Wh x Lh such that 

= U», Vw£W h , 

Ee£ h 

( a:<Jh,v ) — ^ • v)e — (Afc,u • VE)dE] = o, V v e Ah, ( 31 ) 

E€£ h 

Y2 ( a h ■ VE,fJ-)dE = 0, V H € L h , 

Ee£ h 

where ue denotes the unit outer normal to E and a = a~ x . The solution cq, is 
introduced to approximate the vector field 

a = —aVu, 


which is the variable of primary interest in many applications. Since a lies in the 
space 

(div; 0) = {u € (T 2 (ft)) 2 : V • v € T 2 (fi)} , 

and we do not require that A h be a subspace of H( div; fi), the last equation in (3.1) is 
used to enforce that the normal components of ah are continuous across the interior 
edges in d£h , so in fact ah € H (div; fi). 

There is no continuity requirement on the spaces A h and Wh, so ah and Uh can be 
locally (element by element) eliminated from (3.1). In fact, from [15], (3.1) can be 
algebraically condensed to the symmetric, positive definite system for the Lagrange 
multiplier A h- 

M h \ h = F h , (3.2) 


where the contributions of the triangle E to the stiffness matrix Mh and the right-hand 
side Fh are 



' v*e 
( a , 1)e ’ 



{aJ^V l E)E 
( a , 1 )e 



(3.3) 


where u l E denotes the outer unit normal to the edge e l E , u l E = \e l E \v l E , \e l E \ is the length 
of e E , J E = (/, l)E(x,y)/(2\E\), and \E\ denotes the area of E. 
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Let Ph denote the L 2 (Q) projection operator onto Wh, a-h = Ph&, and fh = Phf ■ 
Also, set 

A|e = f( 3 “ Ie ’ 


and 


ah(^,fa = V(/p) s . 

Eee h 


Then as shown in [15], the system (3.2) corresponds to the system arising from the 
triangular nonconforming finite element method: find fa € 14 such that 


ah(fa, fa) = (fh, fa, Vv? G 14. 


(3.4) 


Hence Algorithm 2.1 can be used to solve (3.2), i.e., the mixed method (3.1). It should 
be also noted that the natural degrees of freedom, i.e., the values at the midpoint of 
edges, of Lh and 14 are the same. 

After the computation of A h, &h and u h (if they are needed) can be recovered as 
follows. Set <r h \ E = (a E + b E x,c E + b E y) and J E = f h \ E . Then it follows from [15] 
that 

b E = S -f, 

a £ = ~(zd)4 (^i=l I^eWe 1 ^ ^h\e% + £f-(a,x) E ^ , 

cs= ~wn (e?=i . 

and 

u h\ E = (( aa h,(x,y)) E + ^fa\e%(( x ,y), u E )e%J , E £ £ h . 

We now consider a modified version of the mixed method (3.1) in which the 
coefficient a is projected into the space Wh [20]: find ( <?h,Uhfah ) € A& x Wh x Lh 
such that 

h,fa E = (f, fa, V w e W h , 

Ees h 

(a h <Th,v)- i( u h, V • V ) E - (^h,v ■ v E )d E ] = 0, V V G Ah, (3.5) 
Ees h 

Z (<Th • V E , fa)d E = 0, V /J, G Lh- 
Ee£ h 


Associated with this projected formulation, the linear system has the form in place 
of (3.3): 

= ^ e = - (Je ’J e)e +(4.44. E€£ h . (3.6) 

The corresponding nonconforming system becomes: find fa £ 14 such that 


a h (fa, fa = (fh, fa, e 14, 


(3.7) 



The present systems in (3.6) and (3.7) are simpler than the corresponding systems 
in (3.3) and (3.4). The advantage of the projected mixed formulation over the usual 
one is more obvious for the mixed finite element method over rectangles, which will 
be carried out in the next section. 


RECTANGULAR ELEMENTS 


In this section we consider the lowest order Raviart-Thomas space over rectangles 
[31]. Let Eh be a partition of 0 into rectangles oriented along the coordinate axes, 
and let Qij(E) be the space of polynomials of degree not bigger than i in x and j in 
y. The rectangular mixed space [31] is defined by 

V h (E)=Q 1 ,o(E) x Q 0 i 1 (E), 

W h {E)=P 0 (E), 

L h (e)=P 0 (e). 


We first consider the usual mixed method (3.1). For each E e Eh, set 


n/ x — 

. (<x,x 2 )e 

( ot,x) B 

y _ (<*,y 2 )fi 

(°'V)b 

a E~ 

' (<*>Us 


E («.2/)j5 


n/ x . — 

(*4 )«,< 

B 

bral 

Q » 

Kfl 

a i - 
e E 

' (“-Us 


e' E (a.y)fi 

(<*,l )e ' 

ii 

cq 

--(a,x) E a x E + ( a,y) E a y E . 




Then, following [15], it follows that the contributions of the rectangle E to the stiffness 
matrix and the right-hand side are 


m ?i = Wl ■ v 3 e +. i 


x. .A 1 ),,^ 1 ) 

e B 


E E 

-^(a,x) E (a,y) E a x eiB a y e ^ 1) u§ 2) 

+ i K y)W^ B v i 2) ^ 2) , 

FF=^{oc,x) E {a,y) E (a y E a X iUp + cf E c?:v\ 


(4.1) 


*( 2 ) 

'E 


Namely, we again have the linear system (3.2) for the Lagrange multiplier A h for the 
rectangular elements. Also, after the calculation of A h, we can compute cr A and Uh as 
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follows. For each E E Eh, let a^E = {a E + b E x, ce + d E y). Then we have 
a-E = {Et=i[((^x)la x ^-A E \e^\)uf ) -(a,x) E (oc,y) E oil iB uf )>l j\ h \ e i E 

-(a, x) E (a, y) E a y E J E }/ ((a, 1 ) E A E ) , 

C E = [EtiKKy)!^ - ^fil e kl) v e 2) ~ {a,x) E {(x,y)E(y x e i E v E ) }^h | e > B 
-(a, x) E {a , y)sa|7£;|/ ((a, 1 )sA e ) , 

b E = ^ Ef =1 (“(«, x) E c%i t i4 1) + («» J/)s«e. fi ^ 2) ) ^le> B + y)Ea y E J E , 

d E = jjELi A ftle* B + T^{^x) E a x E J E , 

and 


u E 


= (a ’^^ v)E {E + «Ki^ 2) ) ^l.<, + 44 /e}- 


We now consider the projected mixed method (3.5) on rectangles. For each E E £’/ l , 
let = \v E \ — \v e 2 \-, and let Ax E and A y E denote the x-length and the y-length 
of E , respectively. Then (4.1) reduces to 


^=- i % k + (J , i,4k, 


where 


= Ax| + Ay|, J s E = j^~ (A y%x, Ax\y) . 


The cTh and Uh can be computed in a much simpler way. Let (x E ,y E ) denote the 
center of the rectangle E. Then we have 




6*=p|i|fcEL,(-l4 1) l + kF ) l) + 


c E 


(a,l) E R E 
_ \E\ 


A 7^ U E 

&y%jB 
Re 5 


£?=,fe (-l^”l + 14° 


l) _ 4^ 1 4 21 ]"'il.; 


XE&ypjE 
Re ' 


ve^ 2 pJe 
Re ' 


and 


(oA ) e^=1\Rb \ ^ e 11 V A yp^E 

j _ 6 l g l y~>4 f\ *(!) I _ I l) I & x %f B 
dE -(a,\) E R B E-»=l \\ U E I I V E \) + R b ) 


u k \z = (Avll4 l) l + A4l4 2) l) + {a, ^ E} fB- 
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Now we see that the projected mixed method produces a much simpler system. The 
equivalence between the mixed method and the nonconforming method can be estab- 
lished as in the triangular case [15]. In the present case, the corresponding noncon- 
forming space is 

Nh = : £|i? = a E + a 2 E x + a^y + a E (x 2 — y 2 ), a. E £ IR, VI? £ £h', 

if Ei and E 2 share an edge e, then f e iflg^ ds = J e £\dE 2 ds ; 

an d IdEndO. ^1^ d s = oj. 

Moreover, the definition of Algorithm 2.1 remains the same here provided that a 
coarse-to-fine intergrid transfer operator can be defined for the rectangular elements. 
As an example, we give a variant of the operator in Example 1. Other cases can be 
similarly extended. 

Let {£h k }k = o he a family of triangulations of f 1 such that £h k = £k is constructed 
by connecting the midpoints of the edges of the rectangles in £k-\- Following [1], we 
define the coarse-to-fine intergrid transfer operators : I4_i — > 14 as follows. If 

£ € 14-i and e is an edge of a rectangle in £k, then £ 14 is defined by 


r o 




vds 


1 

20 



v\ El +v\ E2 )ds 


if e C dfl, 

if e ^ dE for any E £ £k-i, 

if e C dEi f! dE 2 for some Ei,E 2 £ £k-\ 


The results in the previous theorem remain the same here [18]. We remark that 
the analysis in the paper applies to differential problems with a tensor coefficient 
and a lower order term. Also, while we only considered the Raviart-Thomas spaces 
on triangles and rectangles, other mixed finite element spaces (see, e.g., [11, 12, 13, 
19, 22, 28, 29, 31]) can be similarly dealt with. For more information on these 
extensions, refer to [15, 17, 16, 18]. Finally, refer to [10, 23, 24, 25, 26, 30, 32, 34, 35] 
for multigrid algorithms for mixed finite element methods using different approaches 
than the present one. 


NUMERICAL EXPERIMENTS 


We present the results of a couple of numerical examples to illustrate the theory 
developed in the earlier sections and to show a comparison between the results ob- 
tained here and those generated by the well established conforming finite element and 
finite difference multigrid algorithms [7, 8]. Thus we apply the numerical data given 
in these earlier papers. These results are reported in [18]; more numerical results 
can be found in [15]. Numerical experiments for comparisons among the operators 
described in section two will be for future work. 
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EXAMPLE 1. In the first example we consider the Laplace equation on the unit 
square 


(5.1) 


—A u — f in 0 = (0, l) 2 , 
u = 0 on dCl. 


h*K 


( fc’U) ? ^VJ ) 

Bfl 

1/8 

(1.48, .43) 

(1.46, .40) 

(1.45, .39) 

1/16 

(1.64, .46) 

(1.47, .42) 

(1.47, .41) 

1/32 

(1.81, .50) 

(1.48, .43) 

(1.48, .43) 

1/64 

(1.86, .51) 

(1.48, .43) 

(1.48, .43) 

1/128 

(1.96, .54) 

(1.48, .43) 

(1.48, .43) 


Table 1. Convergence Results for Example 1 


We approximate the solution to (5.1) using the triangular nonconforming method 
(i.e., the triangular mixed method). The analysis of section two guarantees that 
the condition number of BkAk for the variable V-cycle algorithm can be bounded 
independently on the number of levels and the W-cycle algorithm has an optimal 
convergence property. Table 1 gives the condition number k for the system BkAk 
and the reduction factor for the system I — BkAk as a function of the mesh size 
on the finest grid, where the V-cycle, W-cycle, and variable V-cycle algorithms are 
indicated by (k v ,S v ), (k w ,S w ), and (k vv ,S vv ), respectively. The V-cycle and W-cycle 
schemes use one smoothing step. The coarse-to-fine intergrid transfer operator in 
Example 1 of section two is used. (To see how the convergence rate depends upon 
the number of the smoothing steps, refer to [15].) For all of the runs, the coarse grid 
is of size ho = 1/2. As noticed in the conforming case [8], the variable V-cycle and 
the W-cycle algorithms have essentially identical computational results. This is due 
to the fact that both algorithms have exactly the same number of total smoothings 
on each grid in the multi-level iteration. While there is no complete theory for the 
V-cycle algorithm with the averaging transfer operator, it is of practical interest that 
the condition numbers for this cycle remain relatively small, but the convergence 
rate deteriorates with the mesh size. Finally, compared with the numerical results 
obtained in [7, 8], we see that the nonconforming multigrid algorithms in fact compare 
favorably with these standard multigrid algorithms. 

EXAMPLE 2. In the second example we consider the following model problem 
with a variable coefficient a(x): 

— V ■ (aVu) = / in 0 = (0, l) 2 , 
u = 0 on dfl. 

Similar results as in Table 1 are obtained for this problem using the triangular ele- 
ments. Hence we examine the rectangular elements. This example uses the same set 
of data as the first example does. The numerical results are shown in Table 2. The 
same facts as in the first example are also observed here. 
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hK 




1/8 

(1.54, .44) 

(1.50, .42) 

(1.51, .42) 

1/16 

(1.65, .46) 

(1.52, .44) 

(1.52, .43) 

1/32 

(1.86, .51) 

(1.53, .45) 

(1.53, .45) 

1/64 

(1.95, .53) 

(1.53 .45) 

(1.53, .45) 

1/128 

(2.07, .60) 

(1.53, .45) 

(1.53, .45) 


Table 2. Convergence Results for Example 2 
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SOLVING ELLIPTIC PROBLEMS IN 
STRENGTHENED SOBOLEV SPACES 
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SUMMARY 

Fourth-order elliptic boundary value problems in the plane can be reduced to op- 
erator equations in Hilbert spaces G that are certain subspaces of the Sobolev 
space kFf(n) = G^. Appearance of asymptotically optimal algorithms for 
Stokes type problems made it natural to focus on an approach that considers 
rot w = [D 2 W, —D\w\ = uasa new unknown vector-function, which automati- 
cally satisfies the condition div u = 0. In this work, we show that this approach 
can also be developed for an important class of problems from the theory of 
plates and shells with stiffeners. The main mathematical problem was to show 
that the well-known inf-sup condition (normal solvability of the divergence op- 
erator) holds for special Hilbert spaces. This result is also essential for certain 
hydrodynamics problems. 


1. INTRODUCTION 


Fourth-order elliptic boundary value problems can be reduced to operator 
equations in Hilbert spaces G that are certain subspaces of the Sobolev space 
W$ (D) = G^. Construction of asymptotically optimal grid approximations 
and, most particularly, asymptotically optimal algorithms are very difficult now 
because the associated spline subspaces are not of Lagrangian type. These diffi- 
culties evoked a series of attempts to reduce such problems to second-order dif- 
ferential equations, but with no essential progress in the construction of asymp- 
totically optimal algorithms. 

Appearance of asymptotically optimal algorithms for Stokes type problems 
made it natural to focus on an approach that considers 
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( 1 . 1 ) 


[« 1 , « 2 ] = w = = [D 2 W, -Diw] = rot w 

as a new unknown vector-function, which automatically satisfies the condition 
div u = 0 (see [1-9]). (This condition explains why we prefer to use rot w 
instead of grad w.) 

In what follows, we assume, for simplicity, that ft is a simply connected 
domain with Lipschitz piecewise smooth boundary T. We suppose that 

W = W 2 2 (fi; r°) 

consists ofwg that, with their first derivatives, vanish on the set F° C T, 
where one-dimensional measures of T 0 and T 1 = T \ T 0 are positive and To is a 
connected arc. 

We start by considering classical variational problems (plates without stiff- 
eners) , that deal with variational problem of finding 

w = argmin $(it>), (1-2) 

where the energy functional is defined by 

$(w) = I 2 (w) — 2 l(w), (1.3) 


2 2 

I 2 {w) = ^2 ^2(a s ir ,(D s D r w) 2 ) 0 + 2 (a 0 , D\wDlw ) 0 , 


s = l r=l 


the conditions 


ai ,2 = a 2 ,i, a Si r(x ) > K 0 > 0, s = 1, 2, r = 1,2, 
ai,i(x)a 2 , 2 (x) - ag(x) > k x > 0, Vx G fi, 


} 


are satisfied, and 


(1.4) 


(1.5) 


(W = (/l,l,D2t»)o-(/l l 2.^l“)o- (1-6) 

Here, (u, u) 0 = (u, v)/, 3 (n) and f 1>r € L 2 (Q), r = 1, 2. 

Next, we consider a subset S of Q consisting of straight line segments (stiff- 
eners or stringers) Si , . . . , S m ■ For simplicity, we assume that the end points 
of each stiffener belong to T. Thus (considered as cuttings lines), they define a 
partition of ft into a set of blocks (panels) Pi, ... , P m ,. We also assume that, if 
an inner point of a S r belongs to T, then S r belongs to T 1 (note that m! = 1 
if S C T). (r' = T U S corresponds to the union of the panel boundaries.) We 
replace I 2 (w) by 


h{w) = I 2 (w) + I [c rA (D 2 w) 2 + c ri2 (D s D n w) 2 ]ds, (1.7) 

r=l 
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where c r , 1 and c r> 2 are positive constants (r £ [1, m]), s and n = n refer to the 
respective arclength parameter and normal with respect to S r ,r G [l,m], and 
the Hilbert space W consists of functions in Wf (fl; T 0 ) with special traces of 
D,w and D n w on each S r . These traces must belong to W 2 (S r ),r G [l,m], so 
we may define the inner product ( w , w')w by 


m 

{w,w') 2 ,si + ^[{c r ,i,D]wD]w')o t s r + (c r , 2 ,D s D n wD 3 D n w')o,s r ]- (1.8) 

r— 1 

If the end points of a stiffener S r belong to F°, then these traces must belong to 
0 

W% (S r )- The case with only one end point of S r on r° is fairly similar. Also, 
we may replace l(w ) in (2.8) by 

m 

J(w) = l(w) + £[(/;,!, D 2 a w)o,s r + (/;, 2) D.D n w) o,s r ], (1.9) 

r=l 

where /' 1 € L 2 (S r ),f ' Tt 2 € L 2 (S r ), r € [1, m]. This implies that we deal with 
the original variational problem 

w = arg min [JUii/) — 2l(w')]. (1.10) 

tu'etv 

First use of analogous problems in pre-Hilbert spaces dates back to the paper 
of S. Timoshenko in 1915; see also [10,11]. 


2. REDUCTION TO STOKES TYPE SYSTEMS 


Let s = s r = [cos a r ,sina r ] determine the direction of S r ,r G [l,m]. Then 
n = n r = [— sin o r , cos a r ] 
and, in accordance with (1.1), on S r , we have 

D,w = — cosa r «2 -MinOrit! = I r , 3 (u) 

and 

D n w = sina r «2 + cosa r «i = J r) „(u), rG [l,m]. 

With the Hilbert space W in (1.10), we associate a Hilbert space rot W. This 
we describe by introducing a Hilbert space Gi C (U^Q; T 0 )) 2 , whose elements 


201 



are vector fields u belonging to (H / 2 1 (^; T 0 )) 2 and such that the traces of I r<a (u) 
and I r< 5 (u) on S r (they exist in the sense of traces of functions in W^ft)) satisfy 

€ W a l (Sr), /r, n («) G Wl(Sr), T G [l,m]. (2.1) 

The inner product in G\ is defined by 


(u,v)g 1 = (u,v)i,n 

m 

+ £[(!> /r,.(S)/r, 4 (tO)l,S r + (1, /r,„(t/)/ r ,„(tT)) li5r ] (2.2) 

r=l 

(if the end points of a stiffener S r belong to T 0 , then the above traces must 
0 

belong to ( W 2 (5 r )) 2 ; the case with only one end point of S r on T 0 is fairly 
similar). Then rot W C G\ is a subspace of solenoidal vector fields. 

We replace (1.10) by the problem of finding u G G = Gi x G 2 (with G 2 = 
1^2(0)) such that 


where 


and 


H,l(«i;«l) + 6 l,2(w2;Wl) =/l(Ml), Vu'i GGi 1 / 9 ox 

62 ,i(«i;« , 2 ) = 0, Vu' 2 GG 2 , / W 


6i,i(«i; «i) = *i,i(«i;«i)+ 

m 

^ ^r,a(w)/r,a ))l,S r 4" <V,2(1) ^r,n( w )-^r,n( u ))l,5 r ] 


r=l 


h(Vi) = (/l,lVl,l)° + (/i,2, w / i, 2 )o 

m 

+ 5TKAi • (a r ))o.^ + (Aa . 


r=l 


(2.4) 


Here 6i ( i(«i ; w^) is the bilinear form associated with the case where 5 = 0 and 


&2,i(«i; u 2 ) = (div tii, w' 2 )o = f>i,2(u' 2 ; «i). 


The following lemma is fundamental (necessary proofs can be found in [9]). 

Lemma2.1. Let P be a domain with piecewise smooth boundary dP . Suppose 
that dP contains a straight line segment r*(P) = 5* and let F°(P) = dP \ S* . 
Suppose also that the Hilbert space Gi(P ) is defined as in (2.2) with only one 
stiffener S* . Then, a constant K* and v* G Gi(P) exist such that 


(v* ,n)o,s* = 1 

and 

[|t'*li,p + \D,i?\l iS .] 112 < I <* |div tT| 0> p. (2.5) 
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Theorem 2.1. Lei the Hilbert space G\ be defined as above and let 
G 2 = Suppose that S C T 1 . Then there exists <tq > 0 such that 


sup 


( div u,p ) 0 ,n 

Nk 


> <Hp|o,n, <T > 0, Vp G G 2 


( 2 . 6 ) 


holds. 

The obtained result is a generalization of the well-known inf-sup condition 
(see cite[6,8— 10,12,13]; it is interesting that first attempts to analyze relevant 
problems were made in [14]). We note also that (2.6) can be written in the form 


|div*|| < a x . 


Theorem 2.2. Let the Hilbert space G\ be defined as above and G 2 = £ 2 ( 0 ). 
Suppose also that the partition of O into a set of panels Pi , . . . , P m is such that 
each pair P,- and Pi+i, i G [l,m — 1], has a common side S( i+1 G S and P m 
has a side on T 1 (which might belong to S). Then there exists <7 0 > 0 such that 
(2.6) holds. 

Theorem 2.3. Consider variational problem (1. 10) replaced by (2.3). Suppose 
that S is such that the respective spaces G\ and G 2 lead to (2.6). Then the rotor 
of the solution o/(1.10) is the first component of the solution of (2.3). 

Similar results hold for the more difficult problem that differs from (2.3) in 

0 

choices of G\ and G 2 . Elements of G\ = G\ = (W 2 (fi)) 2 are vector fields u, 
0 

belonging to (Wf (Q)) 2 , such that the traces of I r ,s(v) and / r>3 (u) on S r satisfy 

(2.1); the inner product in G\ is defined by (2.2); and G 2 = ^(fi) \ 1 = G-j- 

0 

This problem is associated with (1.10) under the choice W = (W 2 (Q)) 2 (the 
inner product is defined by (2.2)). 


3. PROJECTIVE-GRID (MIXED FINITE 
ELEMENT) METHODS 


We confine ourselves to domains such that T is a closed broken line. We 
can then apply triangulations Th(Q) (possibly composite with a finite number 
of the levels of local refinement) and make use of spline spaces G\ = G\ t h and 
G 2 = Gi,h- Here G 2 consists of piecewise constant functions with respect to 
the triangles T G Th(Q) (or augmented triangles in case of composite triangula- 
tions with local refinement like indicated in [8]); G\ consists of piecewise linear 
functions with respect to the triangles Ti/ 2 G T/»/ 2 ^), where this triangulation 
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is a refinement of T),(0) with ratio 2. Note that there is nothing essentially 
new in construction of subspaces G 1 and G 2 in comparison with the case where 
5=0 because all elements of the old G\ belong to the new one. (Of course, it 
is natural to construct original triangulations by refinement of the original pan- 
els, so we assume that triangulations 7),( 0) yield triangulations Th(Pi) for all 
panels Pi , . . . , P m i . We also assume that To is a union of some sides of triangles 
T h GT„(fi)). 

Theorem 3.1. For the spline spaces G\ and G 2 , there exists a constant a\ J, 
independent of h such that 

(div u, p) n a 

SU P 11,711 > \p\oA h , > 0 , Vp G G 2 . (3.1) 

ueG 1 II u IIgi 

Now it is clear that the convergence of our PGMs can be analyzed in ac- 
cordance with the well-known theory. It is natural to make assumptions of the 
form 


(3.2) 

||fr,n(«l)||l+ 7 ,S r < Kr,n, Pr.a («1 )||l+7,Sr < Kr,s i (3.3) 

and 

IMkp, < K 2 ,u (3.4) 

where i = 1, . . . , m' , r — 1, . . . , m, and 7 G (0, 1]. Then it is easy to prove that 
asymptotic approximation properties of the strengthened Sobolev spaces are the 
same, and we can obtain the error estimates 


||«i - wiIIg, + Il «2 - u 2 ||o < KK 1 . (3.5) 


4. MULTIGRID CONSTRUCTION OF 
ASYMPTOTICALLY OPTIMAL 
PRECONDITIONERS 


Our PGM yields grid systems of type 


Ti,i 

f-1,2 


U 1 ' 


1 

-tH 

£ 2,1 

O 


. u 2 . 


O 


such that (Za.iUijVi)#! = 6 i,i(«i,ii)i for all ui G H\ and vi G Hi. 


(4.1) 
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Here Hi and Hi are standard Euclidean spaces associated with Gi and Gi, 
respectively ( dimi/i = dimGi, dim #2 = dimG^); the operator £2,1 is such 
that 

(I2.1ui.v2) = (div «i, t>2)o,n k ) V«i € C?i,Vu2 G G 2) 
and £1,2 = £2,1 • 

The resulting system differs from the case where 5 = 0 only for the nodes 
on 5. We confine ourselves to the case where 


and (u, v)g! is just 


ti,i(ui,wi) = (u,v)g 1 


m 

(«, v)l,n + 5Z[Cr,l(l. I r ,s(u)Ir, s (v))l,S r + <V, 2(1. £•, n(«)/r,n(v))l,Sr]- 
r= 1 

Here, c r ,i and c r ,2 are nonnegative numbers, r G [1 , m] , and 

2 

(«. «)i,n = X)(' DifcU ’ D k v )o,n- 
k = 1 

We assume that we deal with standard nested triangulations 


(4.2) 


T (,+1) ( 0) = T (,+1) 

of levels / + 1 = 1, . . . , p, where T^ 0 ) is the coarse triangulation, = T h /i(Q), 
and refinement ratio is 2. 

With each triangulation we associate a standard finite element subspace 

G (,) CG = W 2 1 (fi;fo) (4.3) 


consisting of continuous on the domain Q and piecewise linear functions (with 
respect to this triangulation) which vanish on To, / = 0, . . .,p. 

Let QW be a set of vertices P- 1 ^ of triangles 7], which do not belong to 
To, and let each vertex (node) be in correspondence with the standard 
basis continuous on Q piecewise linear function fy l \x) such that $^(7^) = 1 
and = 0 at the remaining nodes, and $^(*) is linear on each triangle 
T, G T(')(fi). Then 

G (,) = {u : u = Ui$\x)}, 1 = 0, (4.4) 

pf°e no 


where Ni+i is the number of nodes in Q( l+1 \ Ni + i = Ni + Nj 


(i) 


R JV '+ 1 = = H[ ,+1) x H { 2 i+1) , H% +1) = HV\ 
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and 


»,„ = {«} 6 H«*'\ », +1 = [„<'+», u<'«) £ ff(‘+'),s= 1,2. 

Along with the basis for = 0 p— 1, we consider the 

hierarchical basis leading to the splitting 

G*' +1 ) = G ( / +1) © G ( 2 +1) C G, / = 1, . . .,p — 1 (4.5) 


where 

G<' +1) = G<'> 

and 


Gi' +1) = {«:«€ G (,+1) , «(P/°) = 0 for all P, (,) € fi('>}. (4.6) 

Along with this splitting for G^ +1 \ we consider 

G (,+1) = G ( ! ,+1) ® G ( 2 ,+1) C V, l € [0,p - 1], (4.7) 

where the components of the vector-functions 

«('+!) = tk< a -' +1 >] g G( ,+1 > (4.8) 

belong to the spaces G^ ,+1 ^ and Gj +1 \ respectively. We emphasize that G^ +1 ^ = 
(5(0 and that the components of «( ,+1 ) G Gj ,+1 '* vanish at the vertices of trian- 
gles Tk G T( 0 . We note also that the Gram matrices for the two indicated bases 
for the space Gj ,+1 ^ take the standard block form 


£('+i) = 


r('+l) 

r('+l) 

^2,1 





rO+l) 

^1,1 

f((+l) 

^ 2,1 


f('+ 1) 
^1,2 
r(0 , 

^2,2 J 


(4.9) 


Lemma 4.1. TAe angle a between the subspaces G% +1 ^ = G(0 and G l [ ,+1 '* is 
not smaller than the angle between the respective subspaces when 5=0. 

Proof. It suffices to introduce the semiinner product 


(u,v)s == 5^[Cr,l(l,/r,.(S)Jr,.(iO)l.S, + M*- M") 'r,n (*0)l,S J (4-10) 


r=l 


and to observe that («( ,+1 ), uW)s = 0 if u( ,+1 ) G Gi ,+1 ^. □ 

Note that if we deal only with isosceles rectangular triangles in T*(0), then 
a > tt/4. 

Now in accordance with the theory of optimal model operators given in 
[9,15-17], we need to approximate the block Lj** 1 ^ = L^ 1 -* G C(h[ ,+1 ^) (here, 
£(if[ ,+1 ^) refers to the space of linear operators that map into itself). 
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Lemma 4.2. Suppose that the basis functions for are indexed so that 

the two basis functions associated with each node on S have consecutive numbers. 

Then there exists a block diagonal matrix with blocks in 

R 2x2 

or R lxl and constants ctq i > 0 and ci,i > 0, independent of l and 
coefficients c r i and c r> 2 (r £ [1, rn\), such that 




('+ 1 ) 

l 


< iiy > 




Proof. We may take 



(4.11) 


where, for all we have 


( 4 a > (,+1) .« (,+1 V +1 > = |a(,+1) ^ 

and 

(AgXV» (,+l) ,» ( ' +1, ) S M., = l|5<' +, »ll|. 

Moreover, we see that * s a P°siti ve diagonal matrix (its elements are uni- 

formly bounded) and 4?*^ * s a nonnegative block diagonal matrix (its elements 
are 0(l/h(' +1 >) ).□ 

Note that if c r> i = c r> 2 , (r £ [1, m]), then * s diagonal. 

Theorem 4.1. Let the operator be the Gram matrix for the basis func- 
tions in Gi . Then there exists an asymptotically optimal model operator B\ ~ A 
such that the constants of spectral equivalence and the estimates of the required 
computational work in solving systems with B\ are independent of c r 1 and 
c r,2 (r € [l,m]). 

Proof. It suffices to apply construction of the model cooperative operators 
and J5(' +1 ) from [9,15-17] , in combination with the above lemmas. □ 
Now we define 


B = 


Bi 0 

0 B 2 ’ 


(4.12) 


where B 2 is, for example, a diagonal matrix whose diagonal elements are areas 
of our triangles in T/,(Q). Then it can be proved (see [6,9]) that 


11^ 1 ||h(b-*)m. h(B) < ^o" 1/,2 > 

where the constants are independent of h. These inequalities, probably for the 
first time, were discussed in [18,19] and written in the form 


^o|M|b < II^IIb-i < ^iIMIb> Vv £ H. 
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They may be regarded as a consequence of the correctness of the original elliptic 
problem and they yield inequalities 

6 0 B < L*B~ l L < B 

and 

(see [6,9,19,20]). Therefore, for solving (4.1), it is reasonable to use iterations 

Bu n+ 1 = Bu n - T n (L*B~ 1 (Lu n - /), (4.13) 

convergence of which is determined by the constants 6 0 and <5i (more precisely, 
by their quotient). Note that if we consider L as mapping of the Euclidean space 
H{B) into H(B~ l ) then its conjugate operator V is given by V = B~ 1 L*B~ 1 
(in our case, we have L* — L). 

Thus, we actually work with the symmetrization defined by the chosen pair 
of spaces; it leads to the system 

Au = B~ l L* B~ x Lu = B~ l L*B~ l f 

with the symmetric operator A considered as a mapping of the Euclidean space 
H (B) into itself. 

In case of the modified Richardson method (4.13), the adaptation proce- 
dure from [6,9] for the constants <5 0 and <5i is available; the modified conjugate 
gradient method can also be used. 


5. EXAMPLE OF SPECTRAL PROBLEMS 


Next, consider the special case of spectral problems in the strengthened 
Sobolev spaces (this problem is connected with estimating linear interpolation 
error in a triangle for a function with standard extra smoothness and is impor- 
tant for error estimates of the finite element method associated with piecewise 
linear functions; similar problems were considered in [3,18] for common Sobolev 
spaces). 

Let T be the triangle with vertices (0,0), (1,0), and (0,1), and let the 
strengthened Sobolev space W consist of functions w G W$(T) that vanish at 
these vertices and such that 


\\Dlw\\ 0 ,s < oo, 

where S denotes the vertical side of T. In other words, we assume that 
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f (D\w) 2 dx 2 < oo. 

Jo 

We define the inner product in this strengthened Sobolev space by 
(w, w')w = ( w , w')2,t + D\ w ')o,s- 


We seek 


Ai = min 


tt; 


IV 


w€lV\0 \w\i >T + \D 2 w\I s 

(see (3.2), (3.3) with j = 1). 

The Hilbert space Gi consists now of vector functions 


« 1 = t? 1 = [tii, ll u li3 ]€(wj(r)) a 


such that 


<j>i{ux)= f ui A (0,x 2 )dx 2 =0, <M«i) = / «i,2(*i,0)da:i = 0, 

Jo Jo 


and 


\D 2 ui t i |o,s < oo. 

We define the inner product in this strengthened Sobolev space by 

(«i,«i )g, = («i)«i)i,T + (^ui.i.^ui.iKs- 
The space rot W is defined by 


rot W = {«i : «i € Gi and div «i = 0). 
Theorem 5.1. Problem (5.2) is equivalent to finding 

Mg, 


Ai s 


min 


u,eroTV\o |«l|o,T + l u i,ilo,s 
and is reduced to a particular case of the eigenvalue problem: 


(«i; i>i)g, + b(vi;u 2 ) = A[(ui;di)o,t + (ui,i>»i,i)o,s]i Vi>i, 


6( u i; u 2) = 0, Vv 2 , 


where 


b(ui',v 2 ) = (div ui,V2)o,t- 


(5.1) 

(5.2) 

(5.3) 

(5.4) 

(5.5) 

(5.6) 

(5.7) 

(5.8) 

(5.9) 
(5.10) 
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Proof. It suffices to rewrite (5.2) in terms of rot w: It is also simple to obtain 
(2.6) for the indicated Gi and G 2 = L 2 (T). □ 

It is not difficult to show that, for the indicated problem, we may apply 
PGMs, based on quasiuniform triangulations and local refinements around the 
vertices of T and in the vicinity of 5. We stress that the basis for the approx- 
imating spline subspaces G i./, and G 2 .f 1 are the same as for the case 5 = 0 
(elements of Gi t h must satisfy nonlocal condition (5.3)). We thus obtain PGM 
of the form: Find «i £ Gi./, and «i £ G 2 ,h such that «i ^ 0 and 

(« i ;^ i ) g , + b(yi\u 2 ) = A(iii;ui) 0 ,, Vt>i £ Gi./», (5.11) 


6(«i; V 2 ) = 0, Vif 2 £ G2,h. (5-12) 

It is very important that (3.1) holds. 

Problem (5.11), (5.12) can be rewritten in operator form (in the Euclidean 
space H = Hi x H 2 ) as 


Lu = 

Li,i Li,2 

' 

= \Mu = A 

M\Ux 


L 2, 1 0 

. U2 


0 


(5.13) 


To obtain effective algorithms for (5.13), we suggest the penalty method, 
yielding the problem 


L\,i Li'2 

£>2,1 — 0 J 2 


Ui 

«2 



a > 0 , 


(5.14) 


where J 2 is a diagonal positive matrix with the diagonal elements equal to areas 
of augmented triangles in Xf,(T). Problem (5.14) is reduced to the standard 
problem 


5jUi = (Lj,! + l/aLi.2^2 1 -^2,i)ui = XM 1 U 1 (5.15) 

in the Euclidean space H\. Moreover, an asymptotically optimal model op- 
erator B\ for Li.i can be constructed like in Section 4 {L\ and B\ are spec- 
trally equivalent operators). This implies that it is possible to indicate a nearly 
asymptotically optimal model operator Di for 5i with the constants of spectral 
equivalence independent of a (operators of such type were obtained in [4,6]). 
Therefore, it is possible to indicate nearly asymptotically optimal algorithms 
for problems of type (5.8)-(5.10) and solve them with high accuracy if need be. 
But, when a moderate accuracy is required, a simpler approach based on the 
penalty method might be more useful. For example, as in [21], it is possible to 
replace (5.7) by 


Ai(a) 


I«iIg, + 1 /a|divu 1 [g T 

min 1 .0 1 1 •) 

U16GAO |«l|o,T+ l U Mlo,S 


(5.16) 
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and make use of the simplest triangulation and PGM. 

Note that along the same lines we can consider problems that have stiffeners 
on other sides of our triangle. 

Our approach can be generalized to more general spectral problems in the 
strengthened Sobolev spaces typical for stability analysis of stiffened plates. 
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Abstract 

A parallel multilevel strategy is developed using spectral (p) finite elements. Hierar- 
chic bases are particularly well suited since the element matrices and vectors are nested 
and the multilevel projections easily performed. Since the basis degree is used to specify 
the multigrid level, an EBE strategy is natural for the multilevel technique. Results are 
presented for two candidate nonlinear elliptic transport problems: the augmented drift- 
diffusion equations of semiconductor device modeling and the stream function- vorticity 
equations of incompressible fluid dynamics. 


Introduction 


Finite element methods in which refinement is accomplished by increasing the degree p of 
the polyno mi al basis can give superior error convergence rates for similar computational 
work than the more commonly used h refinement schemes. However, the condition num- 
ber of the matrix deteriorates with increasing p. This motivates the need for an effective 
preconditioner, and a multilevel scheme in which the basis degree serves as the grid level is 
a natural choice. Hierarchic basis functions, which are constructed by adding appropriate 
functions to the existing lower-degree polynomials, lead to matrices and vectors which are 
nested. This may be a particularly suitable choice for multilevel methods, since the pro- 
jections for hierarchic multilevel schemes are easily performed at little computational cost 
[8, 9]. 

Element-by-element strategies have proven to be efficient and scalable for parallelization 
of fini te element methods using gradient iterative solvers [1, 2, 5, 6]. The basic idea in the 
parallel EBE scheme is to avoid assembling the system and instead perform matrix-vector 
and dot products in parallel at the element level. All matrices and vectors are stored in 
element format. Moreover, in this approach, multilevel operations such as residual calcu- 
lation, restriction and prolongation can be confined to an element and hence require no 
interprocessor communication. The only steps that require communication are the smooth- 
ing iterations and the coarse level solve. A further advantage of spectral multilevel methods 
is that the number of elements in the domain remains constant, and hence the decomposi- 
tion of the domain is fixed across grid levels. An important issue with parallel multilevel 
methods defined in this way is the ratio of communication to calculation. Although this 
ratio may be small for the fine level (high-degree basis), on coarser levels it gets successively 
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larger, and at some point the communication time may dominate the total computational 
time. Further details of the p-type approach are given in [7, 8]. 

P-type Multilevel Scheme 


An alternative to refining the mesh by making the element size h smaller is to increase 
the degree p of the polynomial basis. This strategy results in exponential convergence 
when the solution is sufficiently regular. One disadvantage of the p-type finite element 
method, however, is that the conditioning of the matrix deteriorates with increasing p. 
This deterioration is dependent on the type of basis used. One way to counter this is to 
apply a preconditioner to the system. A p-type multilevel method may be defined by using 
the degree of the polynomial basis as the grid level. The intergrid transfers can then be 
naturally defined in terms of expansions in the appropriate bases. 

The analysis of a finite element Galerkin multilevel scheme is best carried out in the 
variational setting. In this way the Galerkin statement can be formulated on each grid level, 
and the consistency of the projection operators with the finite element discretizations on 
the associated grid levels is assured. The approach here follows that in [4, 8]. We proceed 
by considering a representative linear elliptic problem on a domain Q, with a boundary dtt: 

L(u) = / in fi (1) 

u = g on dCl (2) 

where L denotes the differential operator. Applying the method of weighted residuals and 
integrating by parts, the variational statement of the problem has the form: Find u G H 
with u = g on d£l such that 

a(u,v) = f(v) Vu G H (3) 

with v = 0 on <9fi. Here a(-, •) denotes the bilinear functional, /(•) is a linear functional 
and H is the appropriate space of admissible functions. Introducing a finite element dis- 
cretization and a polynomial basis so that S p C H, we define the approximate variational 
problem on grid level p as: find u p G S p with u p = g on such that 

<z(up,Up) = /Op) Vup G Sg (4) 

where the subscript on S p indicates that the test functions v p = 0 on dQ, p . Introducing the 
finite element expansion and evaluating the integrals in (4) leads to a linear system of the 
form 

ApUp — bp (5) 

where p once again indicates the grid level. Now consider a multilevel scheme where (5) 
corresponds to the fine grid system. Application of an iterative smoother to this system 
yields an approximation u* and associated error e* = u p — u*. Substituting this into (4), 
the error e* is specified by the residual equation 

a(e*,v p ) = r*(v p ) for all v p G Sq (6) 

where 


r *(v P ) = f(v p ) - a(u*,v p ). 


(7) 



Next, introduce a coarser level q such that S 9 C S p . Since all v q are in S 9 and thus in 
S p we can test against the set of bases v q [4] so the solution of (6) also satisfies the property 

a(e*,v q ) = r*(v g ) for all u 9 G Sq (8) 

where r*(v q ) = f(v q ) — a(u*,v q ). This system is obviously underdetermined, so we take the 
best (Galerkin) approximation e* £ Sq to e*. That is, find e* £ Sq such that 

a(e*, v g ) = r*(v q ) for all v q £ Sq (9) 

Substituting the finite element expansion in (9) yields the coarse level system for the 
error correction vector 

A 9 e* = r 9 . (10) 

where A 9 is computed by evaluating the bilinear form on the space S 9 and the right side 
vector defines a natural projection of the residual from S p to S q . More specifically, (7) 
implies 

r *( V q) = f( V q) - a ( U p > Vq)- ( 11 ) 

Note that this requires the a(-, •) inner product of u* and v q . 

Introducing a polynomial expansion for u* and polynomial test function v q 

U P = £(^)i^( x )> V 9 = $(*) ( 12 ) 

3 = 1 

where 4> P j and denote the respective basis functions for S p and S 9 and are the nodal 
degrees of freedom. Upon substitution in (11) this yields 

A/p 

>■•(*?) = /(*?)-£ (is) 

3=1 

where A 9,p = <//}), or in matrix form 

r* = f 9 - A 9,p f3* (14) 

Now r* in (14) can also be computed in a more traditional manner by developing a 
projection of r* from the high level space to the coarser level space as follows: First, expand 
the test function <j>f in the higher-dimensional basis as 

$ = J2 m ik€ ( 15 ) 

k = l 

Then, substituting (15) into (13), 

= £ - Y, m if a ( u *p ’ $) 

j = 1 3=1 

= Y m if r *( < t >P 3) ( 16 ) 

3=1 
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or in matrix form 


r* = M*’ p (b p - A pU ;) = M 9 - p r; (17) 

At this point we need to determine the actual values of m q ^ in order to be able to carry 
out the projection. First let us consider the standard Lagrange bases. These bases have the 
interpolation property: the value of each basis function is one at the node corresponding to 
the basis function, and zero at all other nodes, i.e. <fo(xj) = Sij. It follows that 

N P 

(xj) = £ m ik ^( X i) = m lf ( 18 ) 

k=\ 

and the components of the projection matrix M 9,p are simply the values of the coarse grid 
basis at the fine grid nodes. 

To complete the multilevel concept in the variational setting, a prolongation operator 
is needed which will project the error correction in equation (10) to grid level p. A natural 
choice for the prolongation operator is the transpose of the restriction operator in (17). 
Then the fine grid correction approximating e* in S p is computed from the coarse grid 
result according to 

e p = (M q *) T e* g (19) 

As in the standard multigrid method, these error corrections are added to the approximate 
solution on the finer level to obtain the corrected approximation u p = u* + e p and smoothed 
by fine grid iteration to get a new u* for the next V-cycle. 

The advantages of hierarchic bases become apparent when we extend the previous mul- 
tilevel analysis to this setting [7, 8]. The change-of-basis coefficients in (15) for Lagrange 
bases are simplified for hierarchies because the basis for the space S q is explicitly contained 
in the basis for S p . That is, 


</>! = <% 1 <i<N q (20) 

which implies 

m\f = Sij i = l,...,N q , j = 1, . . . , N p . (21) 

Since the higher-degree basis explicitly contains the lower-degree basis, the finite element 
matrix and vector contributions corresponding to the lower-degree polynomials are nested in 
the matrix and vector contributions for the higher-degree polynomials. Similarly, coarsening 
implies simply deleting the appropriate rows and columns of the matrix. These properties 
are useful in the multilevel context. More specifically, the residual projection in (13) becomes 
r *(4>i) = r *{4> P i) for i = 1,2,..., N q . That is, the components of the residual projection to 
the subspace S q are trivially the first N q components of the fine grid residual. Hence, only 
the first N q components of the residual vector need to be computed. Similarly, the coarse 
grid matrix A q is now the leading N q x N q minor of the fine level matrix A p . Hence A q 
does not need to be recomputed. 

The subspace problem for the error correction in S q again has the form in (10). That 
is, 

A?e* = r*. (22) 

In a two-level scheme this system is solved for e*. The projection of e* to the higher level 
space S p is trivial because of the explicit inclusion of the basis (recall (20)). Hence the 
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corrected high level approximation is simply obtained by adding the N q components of e* 
to the first N q components of u*. This new approximation in S p can then be iteratively 
smoothed and the cycle repeated. 

Since M 9,p extracts the first N q components of a vector of length N p , equation (22) on 
the coarse grid can then be expressed as 

A g e* = M 9,p (b p - A p u;) = b q - M 9 - p A pU ; (23) 


An alternative to the standard error correction method described above takes advan- 
tage of the nesting of the matrices and vectors [9] to operate directly on the associated 
components of u* and b p . First note that M 9,p = [I 0] so (23) implies 

A q e* q = b q - [A, Ag P ]u* = b q - A ? u* - Ag P u* p (24) 


where we have used the block partitioning 

A~ - 


A A 

jrz.q -L^-qp 

A A 

■t^-pq jr ^~'PV 


Up = 


u: 


u; 


pp 


then (24) implies, on transposing A ? u* 


A-gUg 


bg Ag p U pp . 


(25) 


(26) 


where u g = e* + u* is the subvector corresponding to the first N q components of the new 
high-level iterate. This form has two advantages. First, it emphasizes the fact that the full 
residual need not be computed. Second, no intermediate correction needs to be projected 
and added to the fine level approximation. 


For reasons of convenience and parallelization, a simple point Jacobi iteration is the 
preferred smoother for the multilevel scheme. Any smoother must efficiently damp the 
high frequency error modes on the respective grids. For the relaxed Jacobi smoother, the 
relaxation parameter determines which frequencies are damped more quickly than others. 
If we assume that we wish to eliminate the highest frequency eigenmode corresponding to 
the leading eigenvalue of the discrete operator, we obtain the relaxation factor for optimum 
multilevel smoothing [7, 15] 


LJ 


(x, Ax) ! 
Xx, Dx) 


(27) 


where D is a diagonal matrix with Da = An. 


Since this relaxation factor w is a function of the matrix A, it changes with both the 
problem and the discretization. Hence, the optimum relaxation needs to be repeatedly 
calculated for each decoupled equation matrix. This value can be conveniently calculated 
using a power series method. 


There are two main choices for a multilevel strategy applied to a linear system: a 
V (or W) cycle, or a full multigrid cycle. The full multigrid (FMV) cycle uses nested 
iteration to improve the initial guess on successively finer grids. The strategy for solution of 
the no nli near problem uses block iteration and successive approximations. Hence, at each 
nonlinear (or block) iteration, there exists a good initial guess on the fine grid. For this 
reason only V- cycles are used here as a multigrid cycling scheme. 
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The Jacobi smoother can generate oscillations in the cross-wind direction for convection 
dominated problems. The magnitude of these oscillations is proportional to the magnitude 
of the residual. In order to minimize these oscillations, an initial coarse grid correction (no 
pre-smooths) is performed at the first V-cycle. This initial correction further improves the 
initial guess from the previous block iterate, and convergence is improved [7]. 

Multigrid cycling schemes such as the Full Approximation Scheme can be used on the 
full nonlinear problem. Two alternative approaches are used here for the nonlinear problem. 
First, the multilevel solver is used only as the linear system solver for the fine grid problem, 
which is run to convergence using successive approximations and continuation. The second 
approach is a nested iteration scheme: The coarsest grid problem is run to convergence 
on the full nonlinear problem, including continuation in the boundary voltage or Reynolds 
number. The solution is then projected to the next finest grid and the problem on this grid 
level is then run to convergence at the final voltage or Reynolds number. This strategy is 
repeated until the highest grid level is reached. 


Parallelization 


Finite element methods divide a given problem domain into a union of elements for discrete 
solution. Hence, schemes in which blocks of elements are operated on by a processor and 
the processor decomposition follows element boundaries provide a natural way in which 
to parallelize finite element methods [1, 2, 5, 6]. Adjacent elements share nodes on the 
element interface, so the information associated with these nodes may be stored on different 
processors. This information is updated during matrix-vector product or inner product 
operations. This means that messages must be passed between processors in order to update 
these values. The ratio of communication to computation is important because it can limit 
efficiency. The use of high-p elements, which have more internal degrees of freedom, results 
in a higher computation to communication ratio compared to low-p elements. 

For a message passing paradigm, the time to send a message is given by 

tm = O! + (5L m (28) 

where a is the startup time or latency, /3 is the time per byte for message transfer, and 
L m is the length of the message in bytes. For transfers in which a large amount of data is 
to be transferred, the key is to send as few messages as possible so that the startup time 
is minimized. Otherwise the startup time may dominate the communication time. The 
optimum situation would be to send one long message so that the latency is essentially 
hidden. 

The previous argument motivates the need for message bundling using sendlists. A 
data structure is developed in which each processor has a pointer array which contains the 
element and node numbers that are shared with another processor. The order in which 
this information is to be placed into a message is also stored. Thus, when a vector is to be 
updated, a message vector is filled in order and sent to the appropriate processor. In turn, 
a message is received from that processor. A pointer array indicates which element and 
local node corresponds to which position in the array, in the same way as for the message 
which was sent. In this fashion all of the communication between adjacent processors can 
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be accomplished using one message each way, and message latency is minimized. There is, 
however, some overhead in the packing and unpacking phases. 

In the present work we can use an element-type data structure and recast all matrix- 
vector or projection operations at the element level. This means that instead of addressing 
a vector by its global node number, it is addressed by its element and local node number. 
In addition, each element has a pointer array which stores its neighbor elements and which 
points are shared with this neighbor. A specific processor will store information only for 
elements local to that processor. Elements are therefore addressed by the number local to 
that processor rather than a global element number. The pointer array for neighbor infor- 
mation includes the local element number and processor number for neighboring elements. 
This format facilitates parallel coding. 

The formation of the matrix and RHS vector for finite element methods is usually 
accomplished by forming the local element matrices and vectors and summing them to get 
the global matrix and RHS as implied in the multilevel formulation of the previous sections. 
However, in the present parallel algorithm we no longer form the global matrix and RHS, but 
leave them in element form. The matrix and RHS calculation phase is therefore completely 
parallel. H the matrix is to be preconditioned using a global Jacobi preconditioner (diagonal 
scaling), then the diagonal elements of the matrices may be assembled to find the scaling 
factor. This accumulation phase will involve co m munication across processor boundaries. 

Iteration by point iterative methods (Jacobi, SOR, etc.) as a smoother or gradient 
methods (CG, BCG, etc.) for the coarse grid solve involves repeated matrix-vector multi- 
plications or dot products. Calculation of either one requires that the information on shared 
nodes be updated. However, the multilevel residual calculation and projection operations 
require no communication. The residual calculation on the fine grid is (17) 

r p = b p — A p u; (29) 

This is seen, however, as a sum of element contributions. Let A®,r p , b® be the element 
matrix and vectors, respectively. Introducing the Boolean adjacency or connectivity matrix 
which relates global to local variables for element e 

b p = XXb; (3°) 

e=l 


and similarly 


Then 


E 

a p = XXa;b 6 

e=l 


E 


E 


= XXb;-XXA;B e u: 

e=l e=l 

= EBf(b;-AX) 


e=l 
E 

= E B 

e—1 


V 

e x p 


(31) 


(32) 
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and we can use directly the element residuals 


bp - a ; u ; 


(33) 


Note also that because the element bases are defined locally we can introduce a local change 
of basis at the element level and corresponding to the global matrix M 9,p in (17) or (21) 
we have the element projection matrix M| ,p . Then the element residual projection follows 
in a manner analogous to (17) as 

r! = MI*r; (34) 

Thus residual calculation and restriction take place on the element level, without commu- 
nication, and are completely parallel operations. The prolongation to finer grid operates on 
the error vector, which is the solution on the coarser grid. This vector is stored in summed 
format, and hence no updating is necessary. Therefore, prolongation can also take place on 
an element and is once again completely parallel. 


Results 

The above method is now formulated for two nonlinear, coupled transport problems. The 
first test case is the augmented drift- diffusion equations, which model the flow of electrons 
and holes in semiconductor devices. The steady state, scaled form of the equations is 
[3, 12, 16, 17] 

A 2 A ip = n — p — C 

V • (// n Vn - p n nVip) = R (35) 

V • (p p Vp + n P pVip) = R 

where ip is the electrostatic potential, n and p are carrier concentrations, p n and p p are 
mobilities, R is the recombination-generation rate, C is the doping, and A is the scaled 
Debye length. The boundary conditions are Dirichlet at the contacts (ip, n,p specified) and 
homogeneous Neumann elsewhere. 

Equations (35) are decoupled iteratively and successive approximations used to solve 
the nonlinear problem [7, 8, 10]. At each nonlinear iteration, three linear subsystems are 
obtained, which are solved successively with a multilevel method using available solution 
iterates of the other field variables [7] . 

The model problem for the augmented drift-diffusion equations is an n + — n — n + diode 
with doping of 5 x 10 17 and 2 x 10 15 in the n + and n regions, respectively, device length 
of 0.3 pm, active length of 0.1 pm, and an applied bias of 0.51/. Plots of the electrostatic 
potential and electron concentration from source to drain contact are shown in Figure 1. 
Although this is a 1-D problem, it was solved on a 2-D domain with homogeneous Neumann 
conditions on the two horizontal sides. This solution was computed using a uniform 9x9 
grid of 81 quintic elements, and a multilevel solver which used linear elements as the coarsest 
level. 

The second application is the stream function-vorticity equations for incompressible 
Navier-Stokes flow in two dimensions. The steady state form of the equations is [7, 13, 14] 

-i/AC + u-VC = / (36) 

— A ip = £ 
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Figure 1: Potential and electron distribution solutions, n + — n — n + diode, 0.5F bias 

where is the stream function, ( is the vorticity, u is the velocity, and / is the divergence 
of the body force. 

Following the same procedure as in the previous problem, the equations are iteratively 
decoupled using successive approximations. Again, the linear systems arising from substi- 
tution of the appropriate basis and integration are solved with a multilevel scheme, and 
available solution iterates of the field variables are used. 



Figure 2: Stream function and vorticity contours, driven cavity, Re = 50 

The model problem for the stream function-vorticity equations is the driven cavity 
problem. The velocity of the top of the cavity is normalized to one and the viscosity is 
chosen so that the Reynolds number of the flow is 50. Contour plots for the stream function 
and vorticity are shown in Figure 2. The same grid and 5-level scheme was used as in the 
previous case. 

Calculation of the eigenvalue (or relaxation parameter) for the relaxed Jacobi smoother 
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in a power series scheme generally requires a moderate number of matrix- vector multiplies. 
If this were done at each nonlinear iteration for each decoupled linear system, the cost would 
quickly become a significant part of the total computation and communication time. How- 
ever, if the sparse systems corresponding to a particular equation don’t change enough to 
significantly alter this eigenvalue estimate over several block iterations, then the calculation 
can be done infrequently, and the cost can be amortized over several nonlinear iterations. 
In practice, this is found to be the case for both the augmented drift- diffusion and stream 
function-vorticity equations. Hence the relaxation parameter is only recomputed every ten 
block iterations, or at the start of a continuation step. 


For multilevel schemes applied to a decoupled problem, there are two convergence rates 
of interest. The first is the convergence rate of the multilevel solver operating on a particular 
linear system. The second is the convergence rate of the successive approximation method 
applied to the nonlinear problem, the so-called block iterations. 

Figure 3 shows the L 2 norm of the residual at each fine grid smoothing step for the 
augmented drift-diffusion problem, on grids of 576 quadratic elements and 81 quintic el- 
ements, respectively. These two grids have approximately the same number of degrees of 
freedom. The solver uses the number of levels equal to the degree of the fine grid basis. 
The plots display a sawtooth shape, with the beginning of each sawtooth corresponding to 
the formation of the new linear system at each nonlinear iteration. This is followed by a 
linear portion which represents the convergence of the multilevel scheme to the solution of 
the linear system. The large jump in each of the plots is the beginning of the new contin- 
uation step in applied voltage, which corresponds to a new problem. The linear behavior 
of multilevel convergence is evident and is to be expected. The convergence rate is better 
for the potential equation than for the transport equation. This is due to the fact that the 
transport equation leads to a linear system which is nonsymmetric. The envelope of the 
peaks is also decreasing, and this represents the convergence of the nonlinear iterations. 


Residual Norm Convergence 




Figure 3: Multigrid convergence, augmented drift- diffusion, quadratic and quintic elements 

The same types of behavior are demonstrated in Figure 4 for the stream function- 
vorticity problem at Re = 50. The convergence of the nonlinear iterations is more defined 
for this problem, and there is no big jump corresponding to a continuation step. Again, 
the multilevel convergence is linear. The convergence of the linear system corresponding to 
the transport equation at this low Reynolds number is not slower than that for the stream 
function equation. 
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Figure 4: Multigrid convergence, stream function - vorticity, quadratic and quintic elements 


Experiments were performed to test the performance of the nested iteration strategy 
on both model problems. The results indicate that the number of fine grid iterations is 
significantly reduced. The convergence behavior once the finest grid is reached is very 
similar to that shown previously for V-cycles without nested iteration. 

Figure 7 shows the speedup on the Intel iPSC/860 hypercube for the stream function- 
vorticity problem. The speedups are presented for a grid of 1024 quadratic elements and a 
grid of 64 quintic elements, with a Lagrange basis used in both instances. The processor 
decomposition is performed by ordering the elements in the square domain naturally and 
distributing them to the processors in order, i.e. the first ^ elements go to the first 
processor and so on, with N e the number of elements and N p the number of processors. 
The speedup for less than 16 processors is very good, with a parallel efficiency of .83 
for 8 processors. The deterioration of performance above this level is due to the smaller 
problem sizes on each processor, meaning the communication-computation ratio is larger. 
The speedups are similar since the p = 5 case has fewer grid points (smaller problem size). 
For the same number of elements as for the p = 2 case, the speedup will obviously be better. 


Conclusions 


The focus of the present study has been the use of a multilevel scheme for preconditioning 
p-type finite element systems. We show that the spectral multilevel scheme serves as a useful 
preconditioner for the fine grid discretization resulting from the application of spectral fini te 
elements. Convergence rates are linear for both chosen applications, and on both the self- 
adjoint and transport equations. A simple point Jacobi smoother can be used provided the 
correct relaxation is calculated for the corresponding problem and element degree. However, 
the study of more advanced smoothers, especially for the convection-dominated transport 
equations, is considered warranted. 

Acknowledgments: This research has been supported in part by the Texas Advanced 
Technology Program and by the National Science Foundation. 
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Speedup on Intel iPSC/860 



Figure 5: Speedup, stream function-vorticity, Re = 100, Lagrange basis 
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J. E. Dendy, Jr. 

Theoretical Division, Los Alamos National Laboratory 
Los Alamos, New Mexico 87545. 


SUMMARY 


The frequency decomposition multigrid method was previously considered and modified so 
as to obtain robustness for problems with discontinuous coefficients while retaining robustness 
for problems with anisotropic coefficients. The application of this modified method to a problem 
arising in global ocean modeling was also considered. For this problem it was shown that the 
discretization employed gives rise to an operator for which point relaxation is not robust. In fact, 
alternating line relaxation is required for robustness, negating the main advantage of the frequency 
decomposition method: robustness for anisotropic operators using only point relaxation. In this 
paper a semi coarsening variant, which requires line relaxation in one direction only, is considered, 
and it is shown that this variant works well for the global ocean modeling problem. 

1 INTRODUCTION 


Let us consider multigrid with standard coarsening on a rectangular grid of points; that is, 

the coarse grid offspring of a grid {xij : i = 1 m; j = 1, . . . , n} is the grid {a? 2 f— i, 2 j — i : 

i = l,..., I'm/2'1; j = 1 ,..., fn/2]}. If point Gauss-Seidel with lexicographic ordering is the 
smoothing scheme, it is well-known [1] that degradation in convergence occurs for the usual five 
point discretization of 

xx ^ \ 

when 0 < a < b or when 0 < b < a. One cure is to use line Gauss-Seidel as a smoother [1]. 
Another is to use semicoarsening instead of standard coarsening [2, 3, 4, 5]. Still another is to 
employ algebraic multigrid [6, 7]. Of these three, only algebraic multigrid also handles the case of 
the skew Laplacian, i.e., 




(i.i) 


but at the expense of having to use unstructured grids. Another multigrid scheme which handles 
both anisotropic coefficients and the skew Laplacian, using only standard coarsening and point 
Gauss-Seidel as the smoother, is the multigrid method considered by Brandt and Ta’asan [8]. The 
idea of the method, as described by Ta’asan, is as follows: when relaxation is slowly converging, 


* This work was performed under the auspices of the U.S. Department of Energy under contract 
W-7405- ENG-36 and was supported by the Office of Scientific Computing of the Department of 
Energy under Contract No. KC-07-01-01. 
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the finest grid error must have the form 


V = V 0 + ^2e iS iV j , 

J-l 

where the Vj are smooth, the e iS > are highly oscillatory, and n < 2 d (d being the dimension of the 
problem). “This error cannot be approximated on a coarser grid, because it is too oscillatory. 

Since [the] V} are smooth functions, they can be approximated on the next coarser grid. Therefore, 
n + 1 coarse visits are done, each time solving for another Vj.” [8] In the case that the Vj’s 
correspond to (0,0), (0 ,tt), (7r,0), and (w, w), Ta’asan argued that on the yth coarse grid visit, the 
coarse grid equation should approximate the equation 


L?V- h 

3 3 




i H 


where 

Lf = I^e- i ^ h L h e i§ -i sJh l’i I 

and 

Rf = Jf e-^ h R h , 

where R h is the residual on the fine grid with spacing h and 1% is bilinear interpolation from the 
coarse grid with spacing H(— 2 h) to the fine grid. In this case e l -i ^ h I^j is just I # with some 
judicious sign changes. For specific cases, Ta’asan demonstrated that this methodology could 
be simplified so that the coarse grid operators could be formed directly instead of variationally. 
However, in the special case of Vj’s corresponding to (0,0) and ( 71 -, w), [9] follows the methodology 
just described. 


A variant of Brandt and Ta’asan’s method is the frequency decomposition multigrid method, 
developed independently by Hackbusch [10]. To describe this method, let us assume doubly 
periodic boundary conditions and suppose that the finest grid is the collection of points Q M shown 
in Fig. 1. Subdivide Cl M into the four sets ,k = 0, 1, / = 0, 1} as shown. Define I k ; : — > Q M 

by 


h,i — 


1 

4 


/(_!)*+< 

2(-l) & 

V(-i)* + ' 


2(-l)' 

4 

2(-l)' 


2(-l) fc , 

(-i) fc+ 7 


( 1 . 2 ) 


where periodicity is invoked near the boundaries. Define L^f 1 = and let , be the 

residual operator, , : Q M -* Thus a two level method is given by: 

1. Perform v\ multi-color Gauss-Seidel iterations on L M u M = F M . 

2. Solve = fj?,- 1 = Il,(F M - L M u M ), k = 0, 1, / = 0, 1 directly. 

3. Perform u M <- u M + k = 0, 1, l = 0, 1. 

4. Perform 1/2 multi-color Gauss-Seidel iterations on L M u M — F M . 

The frequency decomposition multigrid method is given by applying this process recursively. That 
is, instead of step 2, one decomposes each of the D^ ; _1 ’s into four subsets and treats each of these 
with the two level process, continuing until the grids have few enough points that direct solution or 
solution by iteration alone is efficient. 
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Fig. 1 : Qfj \i = 0 , 1 , j = 0, 1. Q^o 1 is designated 
by solid dots, is designated by nonsolid squares, etc. 


The frequency decomposition method is not robust for problems with discontinuous 
coefficients. In [11] we showed how to modify it to be robust for such problems while retaining 
robustness for problems with anisotropic coefficients. We also considered application of this 
modified method to a problem arising in global ocean modeling. For this problem it was shown 
that the discretization employed gives rise to an operator for which point relaxation is not robust. 
In fact, alternating line relaxation is required for robustness, negating the main advantage of 
the frequency decomposition method: robustness for anisotropic operators using only point 
relaxation. Given the necessity of performing alternating line relaxation, it is natural to consider 
a semicoarsening variant of the frequency decomposition multigrid method. In this variant, 
discussed in Section 2, the finest grid is coarsened only in the ^-direction, and line relaxation by 
lines in x is performed. This variant is robust for constant coefficient, anisotropic problems, but it 
must be modified, as in [11], to be robust for problems with discontinuous coefficients. In Section 3 
we consider the same numerical examples that were considered in [11]. In Section 4 we consider the 
application of this modified method to the same problem considered in [11] arising in global ocean 
modeling. 

2 A SEMICOARSENING FREQUENCY DECOMPOSITION MULTIGRID METHOD 


Let us consider multigrid with semicoarsening on a rectangular grid of points; that is, the 
coarse grid offspring of a grid {x it j : i = 1, . . . , m; j = 1, . . . , n} is the grid {x it2 j-i : i = 1 = 

1, . . . , [n/2]}. The robustness of line relaxation coupled with semi coarsening for constant coefficient 
anisotropic problems was first reported in [2]. For problems with anisotropic and discontinuous 
coefficients, a semicoarsening method was considered in [3] for three-dimensional problems. The 
two-dimensional analogue of this method is considered in [4] and [5]. Both of these papers use 
a technique due to Schaffer [12]; without this technique, the semicoarsening method would not 
be competitive. However, this method is not robust for operators like -A sk ’ h in (1.1). (See the 
discussion in Section 1.) 

To describe the semi coarsening frequency decomposition multigrid method (SFDM), let us 


229 



assume doubly periodic boundary conditions and suppose that the finest grid is the collection of 
points Q M shown in Fig. 1. Subdivide into the two sets k = 0, 1}, where is the set of 

odd x-lines and the set of even lines of Q M . Define I k : — ► Q M by 


h = 


1 

2 



( 2 . 1 ) 


where periodicity is invoked near the boundaries. Define Lf -1 = I* k L M I k and let I k be the residual 
weighting operator, I k : Q M -* Thus a two level method is given by: 

1. Perform v x red-black Gauss-Seidel line iterations, by lines in x, on L M u M = F M . 

2. Solve = f**- 1 = I* k (F M - L m u m ), k = 0, 1 directly. 

3. Perform u M *— u M + I k V M ~ 1 , k = 0, 1. 

4. Perform v 2 red-black Gauss-Seidel line iterations, by lines in x, on L M u M = F M . 

The semicoarsening frequency decomposition multigrid method is given by applying this process 
recursively. That is, instead of step 2, one decomposes each of the fi^ -1, s into two subsets and 
treats each of these with the two level process, continuing until the coarsest grid consists of a 
collection of decoupled sets, each set consisting of just one x-line. 


Since the frequency decomposition method is not robust for problems with discontinuous 
coefficients, one would hardly expect SFDM to be robust for such problems. We use the same 
numerical example employed in [11] to show in Section 3 that this expectation is justified. The key 
ingredient for obtaining robustness for problems with discontinuous coefficients is to use operator 
induced interpolation. The other ingredient is to use Galerkin coarsening, but that ingredient is 
already present here. 


Let us first recall how operator-induced interpolation is defined in the case of semicoarsening 
[4, 12, 5] for nine point operators. It suffices to consider the two level method; let the template of 
L m at a given point be 


NW 

N 

NE 

W 

c 

E 

sw 

s 

SE 


(2.2) 


For this discussion we do not need to introduce indices. For step 3 above, I 0 is just the identity for 
odd lines; for even lines, let 

A~V~ + A°V° + A + V + = 0 


be the equation that would give the row V° = (U,j : i = 1, • • • , M) in terms of the rows V~ = (Vij-i : 
i = 1, • • • , M) and V + = (VJj+i : i = 1, • • • , M), for j even. Here A~, A 0 , and A + are all tridiagonal 
matrices; 

A~ = tridiag(SW S SE ), 

A 0 =tridiag(W C E), (2.3) 

and yf+ = tridiag(NW N NE). 


Then 


F° = — (T 0 )- 1 ^-^- + A + V + ). 


(2.4) 


Unfortunately, use of (2.4) yields a nonsparse interpolation, leading to nonsparse coarse 
grid operators. Schaffer’s idea [12] is to assume that -(A a )~ l A~ and {-A°)~ 1 A + can each be 
approximated by diagonal matrices in the sense that B~ and B + are diagonal matrices such that 


(A 0 )- 1 A~e = B~e and (-A°)- 1 A + e = B+e 
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where e is the vector (1, • • • , l) r . To find B~ and B + requires just two tridiagonal solves. The 
interpolation formula for I 0 using B~ and B + is 

r° = B~V~ +B+V+. 


The derivation for h is a bit different . If we consider just Qq /_ 1 , we have an ordinary 
semicoarsening multigrid method, which is robust for operators which annihilate (0,0), (0,7r), and 
(7r,0). It is not robust for operators which annihilate (ir, tt); to obtain such robustness is the role of 
QM-y \y e can re p ea t the above argument, except that now e is the vector (-1, 1, -1, 1, • • ) T . The 
interpolation formula for I x using the resulting B~ and B + is 

V° = —\B~ |V' - - \B+\V+, (2.5) 

where \B + \ [\B~ |] is the diagonal matrix whose entries are the absolute values of the corresponding 
entries of B + [B~]. 


It can be checked that in the case of constant coefficient zero-sum nine point difference 
operators, this construction gives (2.1). The same procedure is used recursively in the multigrid 
case. We use the notation 


fif- 

J 1 .- 




,ji = 0 or 1 


( 2 . 6 ) 


to denote the general level M - k grid, k = 1, . . . , M - 1. In analogy with the terminology used in 
[11] we refer to this modification of SFDM as CSFDM for “child of the semicoarsening frequency 
decomposition multigrid method.” 


There are some problems for which the presence of contaminates the solution process 
and leads to slower convergence. Examples are given in Section 3. An analogous situation occurs 
in [11]. There the solution was to design switches to detect the strength of certain frequencies and 
to include the corresponding corrections with strength <j>, 0 < <j> < 1. The same solution is employed 
here. Consider Define 


c i > = max{ 0, 1 — 


| C + SW + NW + SE + NE\ 
\C + W + S + E + N\ 


)• 


(In this description we ignore the possibility of zero divides to simplify the exposition.) Thus, (2.5) 
is replaced by 

E° = 4(-\B-\V~ - |5 + |E+). 

Note that 0 is 0 for the standard five point discretization of the Laplacian and 1 for -A sk ’ h . We 
refer to this modification of CSFDM as GSFDM, for grandchild of the semicoarsening frequency 
decomposition multigrid method. 


3 NUMERICAL EXAMPLES 


All of these examples appeared in [11]. They are for problems that are 64 x 64 in size, this size 
problem being sufficient to illustrate the points we are making. We consider five problems. The 
first is 


-V • {D(x,y)VU{x,y)) + a(x,y)U(x,y) = F(x,y) 
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in a bounded region fi of R 2 , where D = (D 1 , D 2 ), D i is positive, i = 1,2, and D\ <r, and F are allowed 
to be discontinuous across internal boundaries r of fi; moreover, Di > D 2 and D x < D 2 in different 
subregions of 0, is possible. Specifically, we consider 

f -V • (DVU) + U = F on (0, 16.) x (0., 16.) . 

\ U doubly periodic, ' ' ' 

for the region shown in Fig. 2 and for the values of D = D 1 = D 2 and F indicated there. The 
differencing employed is given in [13]. 



D=1000. 


Fig. 2: Diffusion coefficients and right hand side for (3.1) 
The second is the standard discretization of 


| -U xx - .OOmUyy — F on (0, 16.) x (0., 16.) 


U doubly periodic, 
where F is chosen so that f F = 0, specifically 

1. if 0. < y < 4. or 12. < y < 16. 


F(x,y) = 


— 1. otherwise. 


(3.2) 


(3.3) 


The third problem is 

(-A sk ’ h U = Fon (0,16.) x (0., 16.) ,, 4 

\ U doubly periodic, ^ 

where A sk ’ h is given in (1.1) and F is given in (3.3). We note that for (3.4) to have a solution, F must 
also satisfy ]£,. .(— ll’l-iyfij = 0; this condition is fortuitously satisfied by (3.3). 


The fourth problem has anisotropic and discontinuous coefficients, 

r _v • ( DVU ) + U = F on (0., 16.) 5c (0., 16.) 
\ U doubly periodic, 
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with the coefficients and right hand side given in Fig. 3. The differencing employed is given in [13]. 


F«0. 

D1=100 

D2-100 


F=0. 

D1=1000 
D2=1 00 


F=0. 

Dl=1000. 

D2-1000. 



F=0. 


00 . 

000 . 


D1=1000. 

D2=1000. 


Fig. 3: Diffusion coefficients and right hand side for (3.4) 


The fifth problem comes from [8]. We consider the operator L h - a with template 



where |a| < 1. We consider 

f L h - 95 U = F on (0,16.) x (0., 16.) (3 

\ U doubly periodic, ' 

where F is given in (3.3). 

Table 1 shows the results for SFDM for these five problems. The first column indicates the 
problem, the second, the number of V-cycles (less than eleven) to solve until the final residual r 
satisfies ||r|| < 10~ 6 , the third the convergence factor of the first cycle, the fourth the convergence 
factor of the last cycle, and the last the average convergence factor. (Recall that the average 
convergence factor for p V-cycles is defined as (||r p ||/||ro||)» , where || • || is the discrete L 2 norm, 
arid r k is the residual on the finest grid after k V-cycles.) An initial guess of zero is used. Red- 
black line relaxation by lines in x is used on all grids. The V-cycle employed uses ui = v 2 = 1. In 
Tables 2 and 3 we give the same data for CSFDM and GSFDM. One can see that CSFDM and 
GSFDM perform much better than SFDM for (3.1) without degradation in convergence factor for 
the other problems. The difference in CFDM and GFDM is dramatic only for (3.2) in contrast 
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to the corresponding methods in [11]. Convergence factors for the semicoarsening variants are 
comparable to those in [11] and even significantly better for (3.2) and (3.5). 

TABLE 1:PERF0RMANCE OF SFDM FOR FIVE PROBLEMS 


Problem 

Number 
of Cycles 

CF — First 
Cycle 

CF — Last 
Cycle 

average CF 

(3.1) 

10* 

.38 

.55 

.53 

(3.2) 

7 

.08 

.08 

.08 

(3.4) 

7 

.08 

.08 

.08 

(3.5) 

10* 

.34 

.59 

.56 

(3.6) 

6 

.04 

.07 

.05 


* fails to converge in ten cycles 


TABLE 2:PERFORMANCE OF CSFDM FOR FIVE PROBLEMS 


Problem 

Number 
of Cycles 

CF — First 
Cycle 

CF — Last 
Cycle 

average CF 

(3.1) 

9 

.10 

.15 

.14 

(3.2) 

7 

.08 

.08 

.08 

(3.4) 

7 

.08 

.08 

.08 

(3.5) 

10* 

.10 

•21 

.19 

(3.6) 

6 

.04 

.07 

.05 

* fails to converge in ten 

cycles 




TABLE 3:PERFORMANCE OF GSFDM FOR FIVE PROBLEMS 

Problem 

Number 
of Cycles 

CF — First 
Cycle 

CF — Last 
Cycle 

average CF 

(3.1) 

9 

.08 

.14 

.13 

(3.2) 

1 

3.0 x 10~ 9 

3.0 x 10~ 9 

3.0 x 10~ 9 

(3.4) 

6 

.03 

.06 

.05 

(3.5) 

10 

.08 

.19 

.18 

(3.6) 

6 

.03 

.06 

.05 


The coarsest grid problem in all three variants consists of a collection of decoupled sets, each 
set consisting of just one z-line. If the problem is nonsingular, there is no difficulty in solving the 
associated periodic tridiagonal systems. If the problem is singular, then the tridiagonal system for 
( see (2-6)) is singular. To attain uniqueness one need only add a positive number to 'one of 
the diagonals of this tridiagonal system, pinning down the solution for this grid and thus assuring 
a unique solution. Such a problem, of course, has a solution determined only up to a constant; i.e., 
the computed solution plus any constant is still a solution. A similar technique is used in [14]. In 

the case of (1.1), the tridiagonal system for Q] x is singular; addition of a positive number to one 

of the diagonals is all that is required in this case as well. 
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For parallelization of SFDM and its offspring on the CM-5, we lay the grids out in the 
obvious way. Efficient communication in x on all grids is obvious since each point communicates 
only with its nearest left and right neighbors. Communication in y is efficient since each point 
communicates with bottom and top neighbors a. power of two distant, and with the immediate left 
and right neighbors of these points. Like the methods in [11, 15, 10], the methods here keep all 
the processors busy on every grid level, and again this busyness is actually a disadvantage when 
the number of points per processor exceeds one (vp ratio greater than one), for then the virtual 
processors are kept busy on every level as well. In the method of [4], work is halved on each coarser 
level until a vp ratio of one is reached: from then on, work on each level remains constant, with 
more and more processors becoming idle. But for SFDM and its offspring, work on each level 
remains constant regardless of the vp ratio. For the method of [4] and a vp-ratio > 1, it is possible 
to organize the problem so that efficient relaxation can be achieved per processor and — by doing 
intra-processor moves — still achieve efficiency for interpolation and residual weighting; most of 
the communication is done within individual processors, not between processors. But for SFDM 
and its offspring as organized here, for sufficiently coarse levels, one is forced to pay the same off 
processor communication penalty for every point of every grid. 


4 APPLICATION TO A GLOBAL OCEAN MODELING PROBLEM 


The original motivation for this work came from an application in global ocean modeling. In 
[16] an elliptic equation is solved at each time step. This equation is differenced so that the (7r, 7r) 
frequency is in the null space of the operator. The reason for this differencing is that it is required 
for an energy conservation relation that is deemed to be important to long time integration of the 
system. This differencing is common in the meteorological community, although some rebels are 
attempting to introduce new models which do not employ it. There are other difficulties as well. 
Since spherical coordinates are employed (fortunately with the regions near the poles left out), the 
difference stencil (when normalized) is close to L h ’ a (see (3.5)), with |a| close to 1, in some regions. 
The diffusion coefficient depends on the depth of the ocean. On the scale of the grids used, this 
depth jumps no more than a factor of a hundred from cell to cell. Land masses are dealt with by 
the use of dead cells; that is, on land the equation that is solved is (Id)U = 0., where Id is the 
identity operator. The presence of dead cells and discontinuous coefficients really rules out the 
use of SFDM. Both CSFDM and GSFDM provide a mechanism for assuring that the coarse grid 
dead cells do not couple to the coarse grid ocean cells. The final difficulty is that the boundaries, 
approximated by lines of constant latitude and longitude, are ragged — coastlines tend to be 
fractal. 

Because of the existence of lines of latitude that intersect no land masses, for which periodic 
boundary conditions are imposed, we need an efficient solver for periodic tridiagonal systems. Such 
a solver is still not available in CMSSL (Connection Machine Scientific Software Library). Thus 
we still employ a trick due to R. D. Richtmyer [17]: Let the unknowns of the periodic tridiagonal 
system be indicated by {an, ■ x m }. Set x m = 0., and solve for {au, ••• denoting the 

solution by s°. Set x m = 1 ., and solve for {an, • • • denoting the solution by s 1 . (The CMSSL 

tridiagonal solution algorithms can be used to solve for s° and s 1 . ) Every linear combination of 
s° and s 1 has zero residual for {2, - - - , m - 1}. It is easy to construct the linear combination that 
has zero residual at 1 and m as well. This linear combination involves division by the difference 
of residuals of the system at 1 for s° and s 1 ; this can involve the difference of two small, nearly 
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equal numbers and lead to the tri diagonal system being solved to not very great precision. The 
cure is to use the obvious defect correction algorithm to obtain more digits of accuracy. In [11] the 
better conditioning of the coarse grid operators (in comparison with the operators obtained from 
semicoarsening) resulted in not having to use this defect correction algorit hm . 

In the original model, the solution of a steady state, zero row-sum, discrete, elliptic equation, 
call it L h U h = F h , was required at each time step. The problem of generating a compatible right 
hand F h for testing was solved by applying the difference operator to a random grid function; the 
F h thus generated satisfies J^ij F !j = 0 and F lj = 0. In [11] many simplified situations 

were investigated, with the intent of showing that the reason for poor convergence for the actual 
problem was poor approximation on coarse grids due to the complicated boundary. We omit the 
investigation of these simplified situations here since the behavior of the semicoarsening variants 
parallels the behavior of the methods in [11]. 

The original model was improved by requiring the solution of a time-dependent equation [18]. 
Thus at the nth time step, one must solve 


G 

(At ) 2 


jjh,n 


+ L h U h - n 


jp h,n 


( 4 . 1 ) 


where = const. (area of (i,j)th cell). In this model the size of the time step, At, is limited by 
a Courant condition. For the 256 x 128 problem considered here, the ratio of to the diagonal 
of L h ranges from .01 to 35.0 for the active cells, with a mean value, including dead cells, of . 3 . 
There is no apparent correlation of the value of this ratio with the location of the boundaries, but 
it was clear in [11] that the addition of this time step term to the operator greatly improves the 
correction capabilities of the coarse grid operators. However, it was also shown that the time step 
is not large enough to achieve a good convergence factor with relaxation alone. As in Section 3, a 
zero initial guess is used. 


TABLE 4:PERFORMANCE FOR THE GLOBAL OCEAN PROBLEM ( 4 . 1 ) 


Problem 

Number 
of Cycles 

CF — First 
Cycle 

CF — Last 
Cycle 

average CF 

CSFDM 

10 * 

.03 

.60 

.34 

GSFDM 

10 * 

.03 

.63 

.35 

CSFDMA 

10 

.02 

.42 

.27 

CSFDMB 

5 

.02 

.12 

.05 

CSFDMC 

5 

.02 

.11 

.05 


* fails to converge in ten cycles 


The performance of CSFDM and GSFDM is in sharp contrast to the situation in [11], 
where the addition of the time step term results in great convergence. There are three variants 
of CSFDM listed in Table 4, CSF.DMA, CSFDMB, and CSFDMC, the last two of which give 
convergence equal to what was achieved in [11] with alternating line relaxation. To motivate and 
explain these variants, it is necessary to recall the construction of operator induced interpolation in 
the case of standard coarsening black box multigrid [13, 9, 19, 14]: At coarse grid points coinciding 
with fine grid points, interpolation is just the identity. At a fine grid point lying vertically between 
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two coarse grid points, interpolation at v it j is given by av,-j_i + bvij+i, where 

a = —(SW + S + SE)/(W + C + E) and b = -(NW + TV + NE)/{W - C + E) (4.2) 

and where we have used the notation of (2.2). That is, one thinks of summing away the x- 
dependence to obtain a three point relation between Vij_i, v it j, and ^, J+ i. A difficulty with this 
approach, when using standard coarsening, is that if p = C + NW + N + NE + W + E + SW + S + SE is 
small, then instead of using W + C + E in (4.2), one should use -SW - S - SE - NW - N - NE instead; 
this point is discussed in [14]. 

In semicoarsening black box multigrid, the analogous choice would be to use 

-NW -N -NE-SW -S-SE-W -E (4.3) 

instead of C in (2.3). In normal semicoarsening black box multigrid [4] (and for Q^ _1 here), 
however, this choice leads to no improvement in convergence factor. For n^ -1 , the analogous 
choice is to use 

\-NW-W-SW + S + N-SE-E-NE\ (4.4) 

instead of C in (2.3) to derive (2.5); this choice on coarser grids can lead to operators for which 
C + NW + N + NE + W + E + SW + S + SE > 0 is no longer valid. Hence, it seems safer to use 
(4.4) only for level M. 

To summarize, in Table 4 CSFDMA, CSFDMB, and CSFDMC have the following meaning: 
CSFDMA: CSFDM with (4.3) and (4.4) enforced at all levels. 

CSFDMB: CSFDM with (4.4) enforced at all levels. 

CSFDMC: CSFDM with (4.4) enforced only at level M. 

Simple analysis shows what can go wrong with using C instead of (4.4). Let us consider the 
operator 

r)' 

where ?? and e are both nonnegative and small. In this case, if we use C instead of (4.3), |£ - | and 
|B+| in (2.5) are both diag(8 ), where 6 = ^+r- Thus 0 < 6 < |. A computation shows that I\LI\ has 
the form (2.2), where C = (l-2<?+40 2 ) + (l+20 2 )??+(l+26O<:, W = E = -± + §e, S = N = -20+20 2 +6> 2 77+20e, 
and SW = NW = NE = SE = 9 - 9 2 + 0 2 e. For rj sufficiently large, W + C + E is positive and A 0 in 
(2.3) is invertible. For e = .05 and rj — .02, 6 = and C + W + E is negative. Thus by continuity, 

C + W + E is zero for some values of rj and e, and A 0 is singular; for nearby values of rj and e, A 0 is 
nearly singular, and ill-conditioning occurs. If we use (4.3), however, then 9 = |, E = W = — | + |e, 

C = 1 + |?7 + e, and W + C + E = §?? + 5e is always positive. Similar arguments show that if (4.3) is 
always used, then C+NW +N + N E+W +E+SW +S+SE < 0 can happen on coarser grids. Numerical 
experiments show that this seems to happen on grids of the form (see (2.6)), where k > 3 and 

ji — ji + i = 1 for some ?'. Such grids can be deleted [20, 8] without harming the convergence factor, 
as illustrated by the nearly identical performance of CSFDMB and CSFDMC. Thus an alternative 
would be to include the corrections from such grids with weight zero. 


I 1 -* - 

L — [ —l + e 2 + t? -1 

V 4 i-‘ - 
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AN OPTIMAL ORDER NONNESTED MIXED MULTIGRID 
METHOD FOR GENERALIZED STOKES PROBLEMS 


Qingping Deng| 

Abstract. A multigrid algorithm is developed and analyzed for generalized Stokes problems dis- 
cretized by various nonnested mixed finite elements within a unified framework. It is abstractly proved 
by an element-independent analysis that the multigrid algorithm converges with an optimal order if there 
exists a “good” prolongation operator. A technique to construct a “good” prolongation operator for 
nonnested multilevel finite element spaces is proposed. Its basic idea is to introduce a sequence of aux- 
iliary nested multilevel finite element spaces and define a prolongation operator as a composite operator 
of two single grid level operators. This makes not only the construction of a prolongation operator much 
easier (the final explicit forms of such prolongation operators are fairly simple), but the verification of 
the approximate properties for prolongation operators is also simplified. Finally, as an application, the 
framework and technique is applied to seven typical nonnested mixed finite elements. 

Key words, generalized Stokes problems, mixed methods, multigrid algorithm, nonnested 

AMS(MOS) subject classifications. 65F10, 65N30 

1. Introduction. This paper will develop an optimal order multigrid algorithm for 
solving mixed finite element equations of the following generalized Stokes problems: 

—A u + Vp = /, in fl, 

divu = < 7 , in ft, 

u = 0, on dQ. 

where Q, is a bounded convex domain in R 2 . If / G iT -1 (fi) and g G I/q(£1), (1-1) 1S 

uniquely solvable (cf. [20]). We refer to [7] and [20] for notations and definitions of the 
function spaces used in this paper. The velocity-pressure variational formulation of the 
saddle problem for (1.1) is to find [ u,p ] G (Hq (fl)) 2 x Lq(£1) such that 

l"W 

(1.2) £([■.,?]>,?]) -AMD. Vh^ISjPfx^O), 

where (•, •) = (•, -)n stands for the inner product in L 2 (Q,) or (L 2 (fi)) 2 , and 

(1.3) £([u,p],[v,q]) = (Vu,Vv) - (p, div v ) - (q,divu), 

(1.4) F([v,q\) = (f,v) + (g,q). 

~ ~ ~ 

Let Tk (k > 0) be a quasi-uniform triangular or rectangular partition of O with mesh 
size hk = ho2~ k . Tk is obtained by linking the midpoints of the three edges of all triangles 
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of Tk -1 or by linking the midpoints of two opposite sides of all rectangles of T k ~\. For 
simplicity, we also assume that 0 = U KeT k K. Let X k C (L 2 (0)) 2 , M* C be two 

finite element approximate spaces of (Hq(Q,)) 2 and Ll(Q) associated with T k . The mixed 
finite element method for (1.2) at level k is to find [u k ,Pk] ■€ X k x M* such that 


(1-5) Ck([u k ,Pk}A v ^]) = F k([v,q]), V [v,g] G X k x Af*, 


where (-, •)* = £ (■, -)k, and 
KeT k 


(1.6) £ k ([u,p],[v,q\) = (Vu,Vv) k -(p,divv)fc -(g,divu)*, 

(1-7) -P*([w, g]) = (/,v)* + (g,q)k- 

~ /v. ~ 


It is well-known that X* and M* must satisfy the Babuska-Brezzi condition, i.e., 


1(9, div ^fc)fc| 

(!-8) SU P n — f > 7o|klU 2 (fi), V q € M k , 

v k ex k v* ft 

rsj 

rw »v 

where ||u*:||| = (Vv k , Xvk)k, and 70 is a positive number independent of k and h k . We also 
assume that the following error estimate and interpolation property hold (cf. [13], [20]): 

(1.9) ||u - «*||i,*(n) + h k (\\u - u k \\ k + ||p-Pfc|U*(n)) < Ch,l(\\u\\ H 2 (fy + ||p||Hi ( n)), 

(1.10) ||t 7 -njbvii^fli) + MII” ~ n HU + Ik - *kq\\mn)) 

< c/>1(|M|„. (!!) + || 4 ||h, ( 0) ), v [«,s] e (H\n) n Him 2 x (#'(«) n £§(«)). 

Here T k = [n^,^*] is the interpolation operator associated with X k xM k , [ u,p ] G (£T 2 (0)Pl 

/v r 

Hq( O)) 2 x (H 1 (Q.) D Lq(Q)) is the solution of (1.2), and [u*,pjfc] G X k x M* is the solution 
of (1.5). 

However, most commonly used low order mixed elements, which have a matched 
approximate order, do not satisfy (1.9). So, we have to modify them by using some special 
techniques, such as bubble functions, nonconforming elements, and composite elements. 
Unfortunately, the first two techniques must cause the nonnestedness of multilevel finite 
element spaces, and so does the third one for many cases. Hence, it is of interest and 
importance to study nonnested multigrid algorithms for mixed finite element equations 
of (1.1). Simultaneously, we have to overcome some new difficulties since the standard 
multigrid theory cannot be directly applied, and a prolongation operator other than the 
natural injection must be chosen. However, we observe that usually only the finite element 
velocity spaces are modified and the finite element pressure spaces are still some common 
finite element spaces. Therefore, the multilevel finite element pressure spaces are still 
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nested. In view of this observation, we always assume that the nestedness of multilevel 
finite element pressure spaces holds. 

The objective of this paper is to develop and analyze an optimal order multigrid 
algorithm in a unified framework for finite element equation (1.5). The convergence of 
the multigrid algorithm is proved by an element-independent analysis. The technique 
to construct a “good” prolongation operator for nonnested multilevel spaces, that is, a 
prolongation operator which satisfies conditions (1.11) and (1.12), is proposed. The idea 
of defining the multigrid algorithm is adopted from [20]. Our convergence analysis mainly 
relies on the properties (1.11) and (1.12) of the prolongation operator I k -i : Xk-i x 

M k -i -> X fc x Mk given as follows: 

(1.11) ||[u,?] - /f_i[u,9]||o,Jfc < Ch k (\\v\\k-i + IMU^fi)), V [u,g] G X k -i x M*-i, 

(1.12) ||[u,g] - l£_ i r fc _i[t;,g]||o ) fc < Ch 2 k (\\v\\ HH a) + IMIhi(«))> 

V [«, q] € (H 2 (Q) n ^(Q)) 2 x (H\Sl) n L 2 (0)), 

where ||[v,g]|| 0) jfc = (IIHIi^o) + h l\\p\\h(n))^ = i(viv)k + hl(p,p) k )$- Since the multilevel 
finite element pressure spaces are nested, we define a prolongation operator as I k -\ = 
where i k k _ .j is the identity operator on M k -\ . Our basic idea of constructing 
nl- j is to define H k _ 1 as a composite operator of two single level operators which are 
defined on two consecutive levels. Such an idea for constructing an intergrid operator is first 
used for defining two-level Schwarz methods in [4] and [9] , and then for defining multigrid 
methods of plate elements in [10] (but those intergrid operators cannot be expressed in an 
explicit form). Here, our approach for constructing an operator H k _ 1 is to introduce two 
auxiliary spaces W k - 1 and W k corresponding to Xk-i and X k and satisfying W k - 1 C 

/•s j (V 

Wk C (C 0 (fi)) 2 and to define H^_ x = /3 k o i o ak-i = 0k ° &k- 1 ; ajfc-i: Xjt-i — > Wk-i 

is an interpolation operator or the modification of an interpolation operator which uses a 
local averaging technique and fik '• Wk —> Xk is a interpolation operator. By doing this, 

|*V 

there are the following advantages. The first makes the construction of a prolongation 
operator much easier but the final explicit form of such a prolongation operator is fairly 
simple. The second reduces the verification of the properties for an intergrid prolongation 
operator to the verification of similar properties for two single level operators. The third 
allows us to define several different I k -i s - We remark that the convergence analysis can be 
regarded as a simplification and improvement of [11] by X. Feng and the author. Additional 
information about multigrid algorithms for solving the mixed finite element equations can 
be found in [3], [5], [16], [21], [22], where only a few single cases are considered. 

An outline of the rest of this paper is as follows. In Section 2, the formation of the 
prolongation operator I k _ k and the properties of the operators a* and 0k are described in 
detail, and the multigrid algorithm is defined for the mixed finite element approximations 
of (1.1). In Section 3, the optimal convergence of the multigrid algorithm is demonstrated. 
Finally, the abstract framework and technique developed in Sections 1-3 is applied to seven 
typical nonnested mixed finite elements in Section 4. Throughout this paper, unless stated 
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otherwise, C will denote a generic constant which is independent of the grid level k and 
mesh size hk . 

2. The prolongation operator and multigrid algorithm. In Sections 2 and 3, 
we always assume that we are given a family of finite elment spaces Xk x M*, k > 0 such 

that Mk-i C Mfc, > 1 and (1.8)-(1.10) hold. We suppose that there exists a sequence 
of nested finite element spaces {Wfc}fc>o associated with .7*, k > 0, i.e., W k - 1 C Wk C 

(C*o(n)) 2 , k > 1. Also, we assume that two linear operators ak and j3k exist: 

(2.1) : [X k + (C 0 (n)) 2 ](D W k ) - W k , lS k : (W k C)[(C 0 (O)) 2 + X k ] - X k . 

We assume that ak and fl k satisfy the following properties: 


(H.l) a k oa k =a k , on W k , /3 k o (3 k = flk, on X k , 

<"v' 

(H.2) ||» - c, t v\\ LHn) < CfcUMlHufi). V o € n ^(li)) 2 , 

(H.3) \\v - a k v\\ L 2 {Q ) < Chk\\v\\ k , \/v<EX k , 

(H.4) ||u ^ Mlw < Ch 2 k \\v\\ HHa) , V t, G n H^Sl)) 2 , 

(H.5) ||u - (3kv\\mn) < C/ijb||t;||jfc, v V G Wk, 

r^i /v rv /*s/ /■v' 

(H.6) ||a:jk(u + ^)|U 2 (n) < C|k + HlL 2 (fi)> V v €. X k , weWk, 

(H.7) ||^(u + u;)||t 2 ( n ) < C\\v + u>|k 2 (fi)> V v G X k , w<EW k . 

We now define the prolongation operator l \ ?_ 1 : Wjt_i x Mk- 1 — ► -A* x Mk as follows: 


(2.2) /{_! = = [/?fc °a*_i,tjLi]. 

The relation of H^_ x , ak- 1 , and fik are illustrated by the commutative diagram (2.3): 


(2.3) 


X k 
H k k - it 


Pk 

< 


T. 


Xk-! ► W t -k 

~ ak-i ~ 

Following [21], we define the multigrid algorithm for solving the mixed finite element 
equation at level k as follows. Find [w,p\ G Xk x Mk such that 


(2.4) 


£*([«>, *>],[», si) = <?*([»,?]). v [»,?] € X k X Mk, 

cv rsj >-v' />-/ 


where Gk is a linear functional on Ii x Mfc. In particular, it takes the following form on 
the finest grid: GfcQu,?]) = Fk([v,q\). 
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Multigrid Algorithm 

(i) If k = 0, (2.4) is solved directly. 

(ii) If k > 0, let [w° , p°] G -XT x M k be an initial guess and define [w m+1 , p m ] GljX M k 

r* * rsj 

as follows: 

Smoothing step: For 1 < i < m, [w l , p l ] is defined by 
(w\v)k + h 2 k (p\q)k = Ar 2 (G4^9])-A(K _1 ,/> I_1 ],b,?]), V [v,q] GljX M k , 
(w i -w l ~ l ,v) k + h 2 k (p l - p l ~\q) k = C k ([w\p l ],[v,q]), V [u, g] G X k X M k . 


Correction Step : Set 


[ w ™+*^"+ 1 ] = [w ro ,p m ] + Jt 1 [V’,’-], 


where [V>,r] G Xfc_i x Mfc_i is the approximation of [ip*,T*] G X k -i x M*_i defined by 

applying /i iterations with zero as an initial guess of the level (k — 1) algorithm to the 
residual equation 

(2.5) C k -i([tp*,T*},[v,q]) = G k -i([v,q]), V [v,q\ G -XT-i X M k -\. 

Here, 

/-S^ 

In this algorithm, m is some positive integer to be determined and p is any positive 
integer constant greater than or equal to two. In addition, A* = 0(h k 2 ) is chosen to be 
the maximal absolute value of the eigenvalue for the following eigenvalue problem. Find 
[yk,v k ] e,X k x M k , AgI\ {0} such that 

(2.6) C k ([(p k ,u k ],[v,q\) = \((ip k ,v) k + h 2 k (v k ,q) k ), V [v, q] G X k X M k . 

~ ~ ~ ~ ~ ~ 

3. Convergence analysis. In this section, we will discuss the convergence of the 

algorithm defined in the previous section by using induction. A uniform error reduction 
rate bounded away from one is proved in the two-grid case provided that sufficiently many 
smoothing steps are performed. By standard arguments (cf. [2], [12], [14], [18]) the result is 
then extended to the multilevel algorithm. To show the approximation property, we need 
to assume H 2 -regularity for (1.1), which is true if 9, is a convex polygon (cf. [17]). 

Clearly, the eigenvalue problem (2.6) has a complete set of eigenfunctions since £*(-, •) 
is symmetric. Now, let {Ay}, {ip{, u 3 k }, for j = 1, 2, • • • , AT, be the eigenvalues and corre- 

sponding standard eigenfunctions. Then, for any [v k ,q k ] G X k x M k , Cj, j = 1,2, •• • , AT, 

exist such that [ v k ,q k ] = Y^7=i c iVP k ^ v {]- Thus, we define the mesh-dependent norm as 

follows: 

{ N k 
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It is easy to verify the following inequalities: for V [uk,Pk], [vk, qk] G Xk x M*, 


( 3 - 2 ) IIIN>^]llk* = ||[ujfc,3fc]||o,jfc, 

(3-3) ll|[v*»ff*]|IU,fc < C'^r s |l-I[ u fc^fc]llli,fc> *<«, 

(3- 4 ) |A([Mfc,Pfc],[ujfc,gfc]) < |||[«Jfc,Pfc]||| 2 ,*|||[u*,?fc]|||o,Jfc. 

CV (-v- r-v/ 

Let (l| : _ 1 )* : Xfc x Mj, — > x Mk-i{k > 1) be defined by 

(3.5) Ck-i((Ik-iT[vk,qk], [ujb- 1 , gjfc-i]) = £k([vk, qk], Ik -1 SJfc-i]), 

<*’*•' <"v/ r^y r^y 

V [u*-i,5fc-i] G -X’jfc-i x Mk- 1, [^fc,9fc] G Xk x Mfc. 

rv/ 


Then we have 


(3.6) 


S C|||[ui>9ijllkli V [ujb.gj] £l[X M t . 


Lemma 3.1. Under the assumptions (H.1)-(H.7), the operator defined by (2.8) 
and (2.9) satisfies the properties (1.11), ( 1.12 ) , i.e., if*^ is a “good” prolongation oper- 
ator. 

It is not difficult to prove Lemma 3.1 by using (H.1)-(H.7) and the triangle inequality. 
Moreover, by using (1.5)— (1.12), (3.1)-(3.6), and a duality argument similar to that of 
the proof of Lemma 3.4 (cf. [2], [5], [11], [14], [21]), we can prove the following two lemmas, 
which, along with Lemma 3.1, are the keys to prove Lemma 3.4 (approximate property). 
LEMMA 3.2. Let d G Lq(0) and [crj,Tj\ G X j x Mj (j = k — 1, k ) satisfy 

(3.7) £j(Wj,Tj],[v,q]) = (d,q)j, V [v,q]eXj X Mj. 

r*y t~*y r^y r<*/ 

Then we have 


(3.8) 

(3.9) 


\Wj\\j + Ikilk^n) < C'l|d||z,2(n), (j = k - 1, A:), 

|| [cr fc , T- fc ] - [o-jt_i,rjfc_i]|| 0l jfc < Ch k \\d\\ L 2 {a) . 


LEMMA 3.3. Let F G Lq(0) and [crj,Tj] G Xj x Mj (j = k — l,k) satisfy 

r*y f^y rv 

(3.10) £j([<rj,Tj\,[v,q]) = ( F,v)j , V [v,q] G Xj X Mj. 

r^y y*»y r^y r*y r^y 

Then we have 

(3-11) ||[ cr fc-i, r fe- 1 ] - (Ik-i)*Wk,T k } llo.jfc-1 < Ch 2 k\\F\\ L 2 {Q) . 

We now establish the approximation property. 
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LEMMA 3.4. (approximation property) The following inequality holds: 

(3.12) |||[u,g] - < Chl\\\[v,q}\\\ 2 ,k, V [v,g] e X k x M k . 

rv <■*>»* rv 

Proof. Let [(,0] = (-?*_! )* fa, g] G X&_i x M k - 1 , for any fa,g] € X k x M*. Then 

/*s/ 

(3-13) lllfa, q] — I k -i(Ik-i)*[v, 9]|||o,fc = Ifa — l C lli 2 (n) + ^llk ~ ^Ili, 2 (f 2 )- 
By using a duality argument similar to that of [5] and [11], we obtain 
(3-14) || v - H^CWma) < Chl\\\[v,q}\\\ 2 , k . 

We now estimate j| q — 0 |fa 2 ( n ). Let [a j,Tj\ € Xj x Mj ( j = k — 1, k) satisfy 

(3.15) Cj{[aj,Tj],[v',q , \) = {q-e i q , )j, V [v' ,q'] € Xj X Mj. 

CN/ /v/ <■*«/ 

Then we have 

(3.16) ||g - 6f LHa) = (q-0, q) k - ( q~0 , ^)*_ i 

= A(kfc, r-jfe], fa, g]) - £fc-i([crjfc_i, r*_i], [C, 0]) 

~ ~ ~ ^ 

= £k([*k,Tk] -Jjfc_lkifc-l,T-*-l],[v,g]). 

Combining ( 1 . 11 ) and (3.4)-(3.11) and using the triangle inequality, we have 

(3-17) ||g ~ 0||z, 2 (fi) < C7i fc j|fa,g]||j 2 ,*. 

Thus, (3.12) follows from (3.13), (3.14) and (3.17). 

Let [e*,e*] = [u; — w l ,a — a ! ], j — 0,1,2 ,...,m + 1, be error functions of the i th 

iteration the multigrid algorithm defined in Section 2 with m smoothing steps at level k. 
The following smoothing property was proven by Verfiirth in [21]. 

LEMMA 3.5. (smoothing property) For any initial guess, the following inequality 
holds: 

(3-18) |||[e m , £ m ]||| 2 ,fc < |||[e°, £°]||| 0 i fc. 

(V 

From the smoothing and approximation properties and by the standard perturbation 
argument for showing convergence of a W-cycle multigrid algorithm. (cf. [2], [12], [14], [18], [21]), 
we get the following convergence theorem for the multigrid algorithm of Section 2 . 

CONVERGENCE Theorem. Let I k _i be a “good” prolongation operator and let fx > 1 
in the multigrid algorithm. Then a constant 0 < 7 < 1 and a positive integer m exist, all 
independent of the level number k, such that if 

Illk’V*] - k’,'r]|||o,fc < 7lllfaV*]|||o,fc, 
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then 


(3.19) IIIKp] - [»’” +1 ,p” +1 )|||o, l < 7lllk,H - [tA/lllki- 

4. Applications. In this section, we will apply the framework and technique devel- 
oped and analyzed in the previous sections to seven typical nonnested mixed finite elements 
for (1.1), which all satisfy 

(4.1) M fc _! C M*, X k -! £ X k , k > 1. 

To do this, we need to construct a sequence of nested auxiliary finite, element spaces W k 
satisfying W k - i C W k C (Co(fi!)) 2 , to define the operators a k and /?*, and to give the 

explicit formulations of the intergrid prolongation operators H k : _ 1 and I k _ 1 = i^_ 1 ] 

for each specific element. Finally, we need to verify that a k and j3 k satisfy the assumptions 
(H.1)-(H.7). It then follows from Lemma 3.1 that I k _ x is a “good” prolongation operator. 
As has been explained before, we know that jd k and the restrictions of a k on (Co(Q)) 2 
should be some interpolation operators. Therefore, it is quite clear that (H.l), (H.2), and 
(H.4) hold. Thus, we only need to verify (H.3), (H.5), (H.6), and (H.7). The basic idea 
for proving these four estimates is to use the fact that a linear operator from a finite 
dimensional space to another finite dimensional space is bounded, and to combine the 
standard scaling argument technique(cf. [4], [7], [9], [23]). Here we only give the proof for 
the Crouzeix-Raviart nonconforming element. The proofs for other elements can be carried 
out similarly. We hereafter denote by n k the local averaging of an interpolation operator 
; that means, for any nodal parameter p, Tl k v{p) = v(p) = v(p) if v is continuous 

at p, and v(p) takes the local average of v at p if v has a jump at p. Finally, we remark 
that our results show that the bubble function part of the coarse level correction can be 
ignored in the prolongation step for all elements enriched by bubble functions and that 
some prolongation operators in existing multigrid algorithms also can be derived by using 
our technique. 

Example 1: The Mini element and the Bernardi-Raugel element 

These two elements are based on triangles (cf. [1] , [6] ). Here T k is a triangulation of 
Q, for each k > 0. The Mini element is defined as follows: 

(4.2) Afe = {u € ((7o(0)) 2 , v\ K e [Pi(Ar)©span{AiA 2 A 3 }] 2 , V A € T k }, 

cv rsj rs*> 

(4.3) M k = {q e C(H) n X 2 (0), q\ K e p^k), v/(g T k }. 

The Bernardi-Raugel element is defined as 

(4.4) X k = {v e (Co(£2)) 2 , v\k € [Pi{K)\ 2 ©span{pi,p 2 ,P3}, V if € T k }, 

rss 

(4.5) M k = {qe Ll(n), q\ K e p 0 (K), v k g T k }, 
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where Xj (j = 1 , 2, 3) are the bary centric coordinates and p\ = A2A 3 ni, p 2 = Ai A 3 ra2, p 3 = 

~ ~ ~ ~ ~ 

Ai A 2 n 3 , and nj ( j = 1, 2, 3) are the unit normal vectors of the edges opposite to the vertices 

a,j ( j = 1,2,3). It is easy to see that (4.1) holds here. 

For both elements, we choose the W k , and f3 k as follows: 


(4.6) W k = {v € (Co(ft)) 2 , v\ K e [Pi(K)] 2 ,V K e T k }, 

(4.7) a k = n|, p k = n|, 

where III stands for the linear interpolation operator associated with T k . Moreover, by 
using direct computations, we have 

(4.8) iLi = mf-i , *{-ii = \fc ° <*-ii = mt, , ;t,]. 

Now, as it has been explained, we can prove that I k-l defined by (4.8) is a “good” 
prolongation operator. This shows that the CONVERGENCE THEOREM holds with Ijt-i 
defined by (4-8) for the Mini element and the Bernardi-Raugel element. 

Remark 4-1- For the Mini element, we choose W k , a k , and fi k as follows: 


(4.9) W k = {v G(C 0 (ft)) 2 , v\ K £[P 3 (K)]\V K eT k }. 

/V ~ 

We define a k = II|, f3 k = n™, where II| and II™ stand for the cubic interpolation 
operator and the Mini element interpolation operator associated with T k . Then we can 
get another “good” prolongation operator defined by 

Jf-1 = Pti.iL,] = Ifc ° it,] = [nr.it,]. 


Example 2: The Crouzeix-Raviart —Pi element and the Taylor-Hood Pf~—Pi element 
They both are triangle elements, which have the same finite element approximate 
spaces for the velocity field: 

(4.10) X k = {v€(C 0 (U)) 2 , v\ K £ ^(JC) © spanjAi A 2 A 3 }] 2 , V K <E T k }, 

where A j ( j = 1, 2, 3) are defined in Example 1. For the Crouzeix-Raviart P^ —Pi element, 
the finite element space of the pressure field is 

(4.11) M k = {q € L 2 (£l), q\ K € Px(K), V K e T k }, 
and, for the Taylor-Hood P 2 + — Pi element, 

(4.12) M k = {qe L 2 (n) n C{ 0), q \ K e Pi(K), VKe T k }. 
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Also, it is easy to show that (4.1) holds for these two elements. Here we choose Wk as 

in (4.9), i.e., the cubic conforming finite element space on Tk. We define a* = n|, and 
j3k = Hi ( or n[), where H\ and n| stand for the quadratic interpolation operator and the 
Crouzeix-Raviart — Pi (or the Taylor-Hood P^~ — Pi) element interpolation operator 
associated with Tk. Then we have 

(4.13) Il_, = = [ft ° = PLl.ii-l]- 

It may be verified that defined by (4.13) is a “good” prolongation operator. Therefore, 
the CONVERGENCE THEOREM holds with defined by (4-13) for the Crouzeix- 
Raviart P£ — Pi element and the Taylor-Hood P£ — Pi element. 

Remark 4-%- For these two elements and the space Wk, we can choose a* = njj. and 

lit = n|, or at = n*j and ft = n£. Then we have 7{_~= «£_,] = PI, it,] or 

Ik-i ~ [n|, i ] , which are two “good” prolongation operators. 

Example 3: The composite Pi — Pi element 

This is a very simple composite element for (1.1). For each k > 0, Tk is a triangle 
partition; Xk and Mk are defined as follows: 


(4.14) X k = {vE (Co(H)) 2 , 1%, e [Pi(Ki)} 2 , I< = U ] =1 K U V AT G T k ), 

(4.15) Mk = {q g l 2 (Q) n C(U), q\ K e Pi(k), VKe T k }, 

where the Ki (i = 1,2,3) are obtained by connecting the three vertices of K with the 
barycenter. Clearly, this element is stable and satisfies (1.8)-(1.10) (cf. [19]), and (4.1) 
holds. 

Here, we choose Wk defined by (4.7), i.e., the linear finite element space on Tk, and 
define ak = n). and fik = n[, where nj[ is defined in Example 1. Therefore, we have 

(4.16) Ijti = = [ft ° <**-i,**-i] = PL-t-i], 

which is a “good” prolongation operator. Thus we have that the CONVERGENCE THE- 
OREM holds with Ik-i defined by (4-16) for the composite Pi — Pi element. 


Example 4: The Crouzeix-Raviart nonconforming element 

This is the most well-known nonconforming finite element (cf. [8]). For each k >0,Tk 
is a triangle partition; Xk,Mk are defined by 


(4.17) Xk ={v,v\k € [Pi(K)] 2 ,VK G Tk, v is continuous at p G Mk, v(p) = 0 ,p G dMk}, 

(4.18) Mk ={q € L 2 (H), q\ K € Po(I<)M K € T k ). 
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where and in the next example, N k stands for the set of midpoints of the edges of Tk in 0,, 
and dMk is the set of midpoints along dCt. Obviously, (4.1) holds for the Crouzeix-Raviart 
nonconforming element. Here we define the W k, oc k , and /3k as follows: 

(4.19) w k = {v£ (Com 2 , He e [P 2 (e)] 2 ,V e e T k+1 }, 

(4.20) «* = nl +1 , h = no- 
where H| is defined in Example 2 and H£ stands for the standard Crouzeix-Raviart non- 
conforming element interpolation operator associated with T k . Then we obtain after some 
computations 

(4.21) = [ffjU.iiU] = [Ab ° = K.it,]. 

Moreover, the following two Lemmas will show that l| +1 defined by (4.21) is a “good” 
prolongation operator. Hence, the CONVERGENCE THEOREM holds with I k -\ defined 
by (4-21) for the Crouzeix-Raviart nonconforming element. 

LEMMA 4.1. a k and fik defined by (4-20) satisfy ( H.3 ) and (H.5), respectively. 

Proof. We only give a proof for (H.3). Similarly, (H.5) can be carried out. For any 
K € Tk, we denote a domain G(K ) C 0, such that G(K) = U {K 1 € T k , K' fl K ^ 0}, and 
let 

a G(K) = a k\G(K) : -X*|g(A) — > W k\ K- 

Thus, if dK fl 9f l = 0, then it is not hard to show that Pq(G(K )) C -XHg(A )5 a G(.K) is a 
linear operator, and 


a G(K)Po = Po, Vpo € Po(G(K)). 

Furthermore, ||Vt>|| is a norm over X*| g(K)- Therefore, for any K £T k with dK fl dQ = 0, 
we have, by using the standard scaling argument, 

(4.22) \\v - aG(A)Hlz, 2 (/o < Ch k \\V v\\l*{ G {k)) Vu6X*|g(A)- 

For any K £ T k with dK fl dCl ^ 0, ||Vv|| is still a norm over X k \G(K) since v vanishes 

at all midpoints of the sides along dfl. Therefore, (4.22) still holds. Hence, (H.3) follows 
from summing up (4.22) for all K €T k . 

LEMMA 4.2. a k and fik defined by (4-20) satisfy (H.6) and (H-7), respectively. 

Proof. We only consider (H.7). Similarly, (H.6) can be treated. For any K € Tk, it is 
easy to see that 

@K = Pk\K '■ (Wk + Xk)\K Xk\K 

is a linear operator. So, by using the standard scaling argument technique, we have that 

(4.23) \\p K (v + w)\\ < C||(u + u;)||, Vu € X fc | K , w € W k \K- 
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Thus, summing up (4.23) for all K G 7fc, we complete the proof for (H.7). 

Remark 4-3. If we choose 

w k = {» e (C„(n)) 2 , V\ K € [Pi(iT)] 2 ,v K e T k }, 

a k = 11^, and (3k = Ilj., where II| is defined in Example 1, then 
is also a “good” prolongation operator. Again if we choose 

w„ = {v € (Co(J2)) 2 , v\ e e [Pi(e)] 2 ,v e € 7i+,}, 

o;jt = n fc+1 , and (3k = II fc , then we have a “good” prolongation operator again. 

Example 5: A rectangular nonconforming element 

This element was proposed in [15]. It can be regarded as an extension of the Crouzeix- 
Raviart nonconforming element to the case of a rectangle. This element has five element 
nodal parameters, which are the function values at the four midpoints of the sides and the 
center of the rectangle. For each k > 0,7* is a rectangle partition, X k and M k are defined 
as follows: 

(4.24) Xk = {u, v\k € Pk^K eTk,v is continuous at p G Af k ,v(p ) = 0,V p G dfif k }, 

oj f'w' r s/ 

(4.25) M fe = {q G £jj(ft), q\k € Q 0 (K),V K G T*}, 
where 

P K = { p(x) = p(F^ 1 (x)), peP}. 

Here Fk is an affine mapping from the rectangle K to the reference rectangle K , and 
P = sp&xi{l,xx,x 2 ,<p(xi),(p(x 2 ) on K}, <p(t) = |(5t 4 — 3t 2 ). Then, (4.1) holds for this 
rectangular nonconforming element. 

For this rectangular nonconforming element, we take W k as follows: 

ry/ 

(4.19) w k = {ve (Com 2 , v\ e G [Q 2 (e)] 2 ,V e G T k+1 }, 

rsj rsj r 

a k = is defined as in Example 4, and (3 k = TL* k , which is the standard rectangular 

nonconforming element interpolation operator associated with T k . We then obtain 

(4.31) /!_, = = [/?* o^.i, it,] = 

We can show that this I k _ x is a “good” prolongation operator. Therefore, the CONVER- 
GENCE THEOREM holds with I k : _ 1 defined by (4-31) for this rectangular nonconforming 
element. 


252 



REFERENCES 


[ 1 ] 

[ 2 ] 

[3] 

[4] 

[5] 

[ 6 ] 

[7] 

[ 8 ] 

[9] 

[ 10 ] 

[ 11 ] 

[ 12 ] 

[13] 

[14] 

[15] 

[16] 

[17] 

[18] 

[19] 

[ 20 ] 
[ 21 ] 
[ 22 ] 
[23] 


D. N. Arnold, F. Brezzi and M. Fortin, A stable finite element for the Stokes equations, Calcolo, 

21 (1984), pp. 337-344. 

R. E. Bank AND T. Dupont, An optimal order process for solving finite element equations, Math. 
Comp., 36 (1980), pp. 35-51. 

D. Braess AND R. Verfurth, Multigrid methods for nonconforming finite element methods, SIAM 
J. Numer. Anal., 27 (1988), pp. 979-986. 

S. C. Brenner, Two-level additive Schwarz preconditioners for nonconforming finite element method, 

preprint. 

, A nonconforming mixed multigrid method for the pure displacement problem in planar 

linear elasticity, SIAM J. Numer. Anal., 30 (1993), pp. 116-135. 

F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer- Verlag, New York, 
1991. 

P. G. ClARLET, The Finite Element Method for Elliptic Problems, North— Holland, Amsterdam, 
1978. 

M. Crouzeix AND P.A. Raviart, Conforming and nonconforming finite element methods for 
solving the stationary Stokes equations I, R.A.I.R.O. Model. Math. Anal. Numer., R-3 (1973), 
pp. 33-75. 

Q. DENG and X. Feng, Schwarz methods for conforming and nonconforming finite element ap- 
proximations of plate bending problems, Part I: Two-level methods, Technical Report MA-01-94, 
The University of Tennessee, 1994. 

, Optimal order nonnested multigrid methods for the biharmonic problems, preprint. 

, Multigrid methods for the Stokes equations by mixed methods, preprint. 

C. Douglas, Multigrid algorithms with applications to elliptic boundary-value problems, SIAM J. 
Numer. Anal., 21 (1984), pp. 236-254. 

V. GlRAULT AND P. A. Raviart, Finite Element Methods for Navier-Stokes Equations, Springer 
-Verlag, Berlin and Heidelberg, 1986. 

W. Hackbusch, Multigrid Methods and Applications, Springer- Verlag, Heidelberg and New York, 
1985. 

H. Han, Nonconforming element in the mixed finite element method, J. Comput. Math., 3 (1984), 
pp. 223-233. 

Z. Huang, A multi-grid algorithm for mixed problems with penalty, Numer. Math., 57 (1990), pp. 
227-247. 

R. B. Kellogg AND J. E. Osborn, A regularity result for the Stokes problem on a convex polygon, 

J. Funct. Anal., 21 (1976), pp. 397-431. 

J. Mandel, S. McCormick, and R. Bank, Variational multigrid theory, in Multigrid Methods, S. 
McCormick, ed., SIAM Frontiers in Applied Mathematics, Vol. 3, SIAM, Philadelphia, 1987, 
pp. 131-177. 

J. Qin, On the Convergence of Some Low Order Mixed Finite Elements for Incompressible Fluids, 
Ph.D. thesis, Depart, of Math., Penn. State Univ., 1994. 

R. Temam, Navier-Stokes Equations, Theory and Numerical Analysis, North Holland, Amsterdam, 
1983. 

R. VERFURTH, A multi-level algorithm for mixed problems, SIAM J. Numer. Anal., 21 (1984), pp. 
264-271. 

, Multi-level algorithms for mixed problems II. Treatment of the mini-element, SIAM J. 

Numer. Anal., 25 (1988), pp. 285-293. 

J. Xu, Theory of Multilevel Methods, Ph.D. Thesis, Dept, of Math., Cornell Univ., 1989. 


253 



Page intentionally left blank 


A NOTE ON MULTIGRID THEORY FOR NON-NESTED GRIDS 
AND/OR QUADRATURE 


C. C. Douglas 

IBM Thomas J. Watson Research Center 
Yorktown Heights, NY 
and 

Department of Computer Science, Yale University 
New Haven, CT 

J. Douglas, Jr. 

Department of Mathematics, Purdue University 
West Lafayette, IN 

D. E. Fyfe 

Laboratory for Computational Physics and Fluid Dynamics 
Naval Research Laboratory 
Washington, DC 


SUMMARY 

We provide a unified theory for multilevel and multigrid methods when the usual 
assumptions are not present. For example, we do not assume that the solution spaces 
or the grids are nested. Further, we do not assume that there is an algebraic rela- 
tionship between the linear algebra problems on different levels. What we provide is 
a computationally useful theory for adaptively changing levels. Theory is provided 
for multilevel correction schemes, nested iteration schemes, and one way (i.e., coarse 
to fine grid with no correction iterations) schemes. We include examples showing the 
applicability of this theory: finite element examples using quadrature in the matrix as- 
sembly and finite volume examples with non-nested grids. Our. theory applies directly 
to. other discretizations as well. 

INTRODUCTION 

In this paper, we do not make the usual multigrid assumptions. In particular, the 
grids are not necessarily nested. The norms correspond to inner products on a grid, 
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but the inner products are not necessarily identical from level to level. There may 
or may not be algebraic relationship between the linear algebra problems on different 
levels. 

We provide what is really three level analysis rather than the more traditional 
two level theory. Among other things, this provides a rigorous basis for adaptively 
changing levels. 

Assume that there are j spaces Mk, 1 < k < j, approximating some solution space 
Ad. Also assume that dim Ad* < dim 

A set of approximate problems 

AfcUfc + fk = 0, Ukifk^Mk-, Ak^C{Mk)i (1) 

will be solved approximately instead of the desired linear problem 

Au T f = 0, u, f £ Ad, A £ ;C(Ad). 

As usual in multigrid procedures, two sets of mappings between neighboring spaces 
are assumed to exist: The prolongation (or interpolation) mappings are 

Vk- i : Adfc_i — )■ M.k prolongation (or interpolation) 

IZk : Mk -» Adfc_i restriction (or projection) 

In some cases, each Ak is related to Ak+i by 


Ak = Hk+iAk+iVk- 


However, the theorems in this paper do not assume this relation. 

For partial differential equations that are discretized in a standard fashion, there 
can be natural definitions for TZk+i and Vk- Some of these are described in detail and 
shown graphically in [1] and [5]. 

Now, define a fc-level standard correction multilevel algorithm: 

Algorithm MG( k, {fj-e} 3 i=1 , x k , fk ) 

(1) If k — 1 or fik = 0, then solve AkXk + fk = 0 to some accuracy. 

(2) If k > 1 and ^k > 0, then repeat (2a)-(2c) for i = 1, • • • , /^: 

(2a) Update Xk using the pre-solver. 

(2b) Solve a residual correction problem on level k — 1: 

Xk Xk + Vk-i MG( k - 1, 0,,7 Z k ( A k x k + fk ))■ 

(2c) Update Xk using the post-solver. 

(3) Return Xk- 
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It is assumed that 0 < fJ-uPj < 1 in this definition. In practice, fi-j > 1 is common, 
but this can be interpreted as the repetition fij times of the algorithm for the case of 
Pi = 1- 

On all but level 1 (the coarsest grid level), two solvers are associated with a level: 
a pre-solver and a post-solver. These surround the coarser level correction (2b). In 
most real applications, only one solver is associated with a level (one of the pre- or 
post-solvers is the identity operation). The solvers can be smoothers, roughers, or 
direct solvers. 

In order to analyze Algorithm MG from an iterative method viewpoint, we trans- 
form it into a nonstandard form similar to that introduced in [1]. First, add an 
additional level j + 1 , which is just a repetition of level j: 

■M.j+1 — Aij, Vj = 7lj+ 1 = /, Aj + i = Aj, C x ,j = C 2 J = 1. 

The initial residual Zj +X is then given by 

Aj+iXj+i + / = z i+ 1- 


All analysis can now be done using residual correction problems. 

Define the following: 

Zk + 1 The residual on level k + 1 at some step. 

The initial guess for level h, this is normally 0, except 
on level j + 1 . 

Now, define a fc-level nonstandard correction multilevel algorithm: 

Algorithm NSMG(£:, {pi} 3 t=l , z k+u 4 _1) ) 

(1) Initial residual: IZk+iZk+i £ M. k - 

(2) Initial pre-solve: Update xjf ^ to get such that 

Akx^ + n k+1 z k+ i = 4°\ where Il4 0) ll < / ? i 1 ) lt^fc+i||- 

(3) Let xj^ = xf\ 4 J) = z ( k \ and 7i 1} = 0. 

(4) If p.fc > 0, then repeat i = 1, - ■ ■ , fj, k : 

(4a) If i > 1, then 

(4al) Residual: A k x^~^ + 'R,k+iz k +i = 0 $ • 

(4a2) Pre-solve: Update to get xj^ such that 

Akxf + Kk+iZk+i = z%\ where p^ll < 
(4b) If k > 1, then 

(4bl) Correction: 7 ^ = Vk-ix^, where 

x£i = NSMG(fc - 1, {^}L, -4\ 0) ^d 

Ak-lX k -i + T^-kz\? = 4 - 1 - 
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(4c) Calculate a^: 

P? + ^n-ixSJj < a< i) p« i) + Pt-.A^xf^W. 

(4d) Residual: Aili 1 ,' 1 + -,{°j + R i+ ,* t+1 = 

(4e) Post-solve: Update + 7^ to get x^ such that 

A k x^ ] + 1 l k+ iz k+1 = 4°, where ||4 l) || < 4‘ ) l|0* ) ||- 
(5) Return xf k) . 

This is almost the same algorithm as was analyzed in [1]. The difference is in 
step (4c). Here we calculate the norm of the difference between the effect of two 
similar operators on the correction with respect to the residual before the correction 
was computed. 

Consider the example of adaptively changing levels based on reducing the residual 
norm adequately. We can calculate cr^ while computing a correction in step (4b). 
Based on the size of cr k \ we can determine if the current candidate for 4-i is sufficient 
in order to maintain convergence on level k (or a fast enough convergence rate). Should 
4° be too large, more corrections on level A;— 2 or a better approximation on level k — 1 
might be appropriate. 

In order to consider a priori analysis, the actual forms for p^ k and should 

be substituted. Examples of these forms can be found for various elliptic partial 
differential equations and iterative solvers in [2] and [3]. 

A second multigrid variant is a nested iteration scheme, which begins computation 
on level 1 and traverses the levels to some level j , using each level k,k < j, to generate 
an initial guess for level k + 1 and possibly for solving residual correction problems. 
Define a fc-level standard nested iteration multigrid scheme by 

Algorithm NI(j, {^*}{ =1 , x u { f k }{ =1 ) 

(1) For k = 1, — , j, do 

(la) If k > 1, then x k <— V k -ix k -i 

(lb) x k <-MG(fc,{^ = i,^,/ f: ) 

(2) Return Xj 

Note that fit = 1, all £, corresponds to full multigrid (or nested iteration V cycle). 
Choosing fit — 0, all £, corresponds to one way multigrid, i.e., no correction cycles 
whatsoever (see [3] and [4]). 

Define a nonstandard nested iteration multilevel algorithm by 
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Algorithm NSNI(j, {^}^ =1 , 4 _1) ) 

(1) repeat k = 1, - ■ ■ ,j: 

(la) Initial guess: If k > 1, then 

(lal) xi" 1) = ‘P*-i4 , l fc r l) . 

(lb) Residual: Zk = AkX + fk- 

(lc) Solve: x^ k) =NSM G(fc, {pe} j e=1 , z k , xjf 1} ). 

(2) Return x'f 3 ^ . 


THEORY 


In this section, we state some basic theorems, based on a simple theory that is 
computationally useful, including for adaptively changing levels. See [5] for the proofs. 

Associated with each level is a norm, || • || fc . Assume that 

Vu € M.ki 

where the forms of C\ %k and C% t k are known; these constants can depend on the coef- 
ficients in the differential problem and on the grid. A large value of C 2 ,k will inhibit 
the rate of convergence. 

The basic theorem for Algorithm NSMG is the following: 

Theorem 1. Assume the following for all levels 1 < k < j: 


1. Zj+i is the residual on level j + 1 > 2. 

2. z^ is the residual on level k at step i. 

3. ||4° + 

4- \\{I-Vk-xn k )zf\\<8f\\zf\\. 


Let 


Then, 


[j,},. 

£<■> = 4‘Vi 1 ’ and £<«> = R (4‘VM’ [4 0 + CW^fe"’]) • 

J = 1 


dWc)| 




The proof of Theorem 1 is a double induction argument and can be found in [5]. 

A more precise analysis, based on an affine space decomposition of each M k , 
would follow the analysis in [1]. 
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The basic theorem for Algorithm NSNI is the following: 

Theorem 2. Make the same assumptions as in Theorem 1. Further assume that 
( 1 ) is approximated by some such that 

Akik + fk = Ok (2) 

starting from some initial guess Xk — Vk-i^k-i- Given some {Cfc}{ =1 , we want 

pill < CilPi^i + /ill, 

< 

. Pfcll < CfcPfc-ill, l<k<j. 

Then 

E { k k) <( k for 1 < k < j (3) 

for an appropriate choice of 

The proof of Theorem 2 is obvious (see [3] for example). Note that by calculating 
8^ and cr^ as a computation progresses, the choice of and ej^ can be chosen 
adaptively to ensure that (3) is satisfied. 

The one way multigrid method is a common computational method in engineering 
applications. It has been used for decades as a method for producing an initial guess 
on the grid in which a solution to a problem is actually wanted. This process is 
described in [4] for a procedure that he first saw in the 1920’s. 

Consider a typical partial differential equation problem to be solved numerically. 
It is discretized on a set of grids Clk, 1 < k < j, with some notion of grid spacing (or 
a mesh diameter) h 

The basic theorem for one way multigrid is the following: 

Theorem 3. Make the same assumptions as in Theorem 2. Further assume pk = 0, 
1 < h < j, and that 

6k = Ch 9 k , C,q,h> OelR. 

Then 

C k = CC 2 ,k(hk/hk-i) q 

is adequate to ensure that ( 2 ) is satisfied. Hence, ( 3 ) is satisfied with 

Once again the proof is obvious. Note Theorem 3 gives a simple bound for one 
way multigrid that is independent of the solver used on each level. 

FINITE VOLUME EXAMPLE 

Consider the two-point boundary value problem 

j ~(a(x)u x ) x + c(x)u = f(x), a: <E U = [0, 1], , . 

( u(0) u(l) = 0. ^ 
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A finite volume discretization of (4) yields 

a i-l/2 u i-x + (A XiCi — 0 ,_ 1/2 — flt'+l/2) u f + a t+l/2 u f+l = A Xifi, i = 1, . . A 

on level k where a,- +1 / 2 = 2A, +1 / 2 /(Ax ! ;+ Aa-, + i), and Ax,- is the length of cell (interval) 
i. While the grid points x,- +1 / 2 in a finite volume multigrid procedure are nested, the 
locations of the unknowns u are not nested. 

Clearly, one would not use a multigrid approach to solve this problem. However, 
multigrid is a viable alternative for the equivalent multi-dimensional problem. The 
following remarks generalize to the multi-dimensional case through the use of tensor 
product formulations for the prolongation and restriction operators. We discuss the 
one-dimensional case for clarity. 

Let us define a restriction matrix 


(1 

1 

0 

0 

0 

0 ... 

... 0 

°\ 

0 

0 

1 

1 

0 

0 ... 

... 0 

0 

0 

0 

0 

0 

1 

1 ... 

... 0 

0 







••• 0 

0 

\0 

0 

0 

0 

0 

0 ... 

... 1 

1 / 


This is just piecewise linear interpolation. We can also define a prolongation matrix 
Vk-i = 27£fc T . This prolongation matrix corresponds to piecewise constant interpola- 
tion; clearly, not a very accurate choice, but a demonstrative one. 

If we formulate the coarse grid matrix from Ak-i = TlkAkPk-i and a restricted 
right-hand-side from IZk, we obtain 

a 2i-3/2 u i-l + (Ax 2 iC2i + Ax2i-l c 2i-l — a 2i-3/2 ~ a 2i+l/2) u ^ * + 

d2i+i/2Ui + i = Ax 2 i— 1/2;— 1 + Ax 2 ,'/2i, i = 1 , • ■ • , A/ 2 . 

This is a reasonable coarse grid approximation where the only difference from the 
finite vol um e discretization on the coarse grid would be in the use of the underlying 
fine grid to discretize the finite volume integral. 

A straightforward calculation of ||(/ — Vk-i'R-k) x )\\ for arbitrary x shows that 5k = 

1/V2. 

A more practical prolongation matrix would use quadratic interpolation (see [5]). 
The use of this prolongation matrix in the definition of the coarse matrices would 
expand the bandwidth of each successive coarser matrix. This defeats the purpose 
of multigrid where one expects to do less work on the coarser grids. The use of this 
prolongation matrix with the piecewise linear interpolation restriction matrix gives 

5 k = V531/32. 

We can calculate 5k for multidimensional problems when tensor product meshes 
are in use. The calculation of 5k for the piecewise linear-piecewise constant case is 
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fairly easy. In this case, the Vk-i'R-k calculation in one dimension separates into a 
local computation. It effectively collects the cells on the fine grid pairwise to form the 
coarse cell by averaging. In multiple dimensions, because of the tensor product nature 
of Vk-i and TZki that still happens. Hence, the algebra produces a worst case 8k of 

- 1 )/ 2 J , 

where d is the dimension of the problem. So, 

d 8 k 

1 \A/2 

2 ^ 3/4 

3 y/7/8 

This approaches 1 rapidly. However, in any given application, 8k can be smaller. 

The quadratic interpolation case provides better results in multiple dimensions, as 
would be expected. 


AN EXAMPLE OF THE FAILURE OF A k = K k+ iAk+iV k 


Consider the boundary value problem 


2 

_ ]C ( a i 3 ( X ) U xi)xj + t>i{ x )u Xi + c(x)u = f(x), 
;,i=i 

u = 0, 


xeO = [o,i] 2 , ^ 

x £ d£l. 


Let the /c-level partition Sk of O consist of squares of side length 2~ ( - k+i \ where t 
is independent of k. Let the k-level finite element space Mk consist of C'°-bilinear 
functions over Sk that vanish on dCl. Then, the natural A;-level Galerkin equations, 


AkUk — 


would be generated by seeking a function u k £ Mk such that 


J2 + Y.i^xP^) + ( 


I cu k ,v k 


) = (f,v k ), 


v k 


e Mi 


( 6 ) 


*u=i 


2=1 


where (•, •) indicates the inner product on L 2 (0). Note that exact integration is not, 
in general, feasible. Thus, it is usually necessary to invoke a quadrature rule to 
approximate the integrals in (6). A (2 x 2)-Gauss quadrature rule suffices to maintain 
unique solvability of the resulting linear equations, along with the proper asymptotic 
order of accuracy of the fc-level approximation to the solution of (5). Denote by (•, -) G 
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the (2 x 2)-Gauss quadrature approximation to (•,•)• Define the fc-level equations (5) 
through the approximation 

i,j= 1 i= 1 

Consider the feasibility of the relation Ak = IZk+iAk+iVk by making a simple 
parameter count. If the prolongation and restriction operators are defined in terms of 
the parameters related to the vertex values of a single element in the coarser partition 
and the vertex values of the corresponding four squares in the finer partition, it suffices 
to consider a unit square S 1 for the coarser element (associated with index 1) and its 
partition (associated with index 2) into four squares, 5|, j = 1, ... ,4, for the finer 
elements. Note that the sixteen quadrature points on S'? are distinct from the four 
quadrature points on S 1 . Thus, different values of the coefficients in the differential 
equation enter into the formation of the equations (5). 

First, let Mi be the span of the four bilinear basis functions associated with the 
vertices of S 1 and M2 the span of the nine bilinear basis functions associated with 
the vertices of S'?, j = 1, ... ,4. Let us slightly generalize the question as to whether 
there exist 'R-k+i and Vk such that Ak = TZk+iAk+iVk by asking if there exist maps 

P : Mi — y M 2 and Q : M 1 — > M 2 

such that 

(Aiu,v) = (A 2 Pu,Qv), u,v e Mi. (7) 

Consider a simple parameter count. Each of the matrices P and Q has 36 entries. For 
each nontrivial coefficient cq,-, 6,, or c, the quadrature rule associates sixteen values of 
the coefficient in the A 2 -inner product and only four in the Ai-inner product. Thus, 
twelve independent constraints arise for each such coefficient. Since there are seven 
possibly nontrivial, distinct coefficients, it is clear that it cannot always be possible 
to satisfy (7). If there were fewer coefficients to handle, the maps could exist but have 
rather strange relationships to standard interpolation procedures. 

Consider a different question. Let us take reasonable definitions of P and Q and ask 
to what extent (7) fails for locally smooth coefficients. Let P = Q be the embedding 
operator between M 1 and M2, and consider the special case for which 

a,ij(x) = 8ija(x ), b{(x ) = c(x) = 0 . 

It follows easily from the Bramble-Hilbert lemma that 

(Aiu,v) - ( A 2 Pu,Qv ) = O(IMIIMI), u,v <E Ml, 

where the norm is the norm in H 1 . If the analogous restriction and prolongation 
operators are used at each level, 

4 ° = 1 + o(h.D 


+ D'>.’4 . 5 )o + (“ ■ « h ■=(/,«% v k €M 
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if the II 1 -norm is employed at each level. Thus, using the naturally associated quadra- 
ture rule at each level is a reasonable choice for these choices for 7lk+i and Vk- 

CONCLUSIONS 


The theory here is more precise than in [1]. Further, it is applicable to problems 
that are not nested and/or ones in which the linear systems use quadrature in their 
assembly. Also, the theory here allows multigrid software (e.g., [6]) adaptively to 
change levels with a higher degree of precision than with the earlier theory. 

The theory has been tested on several problems, ranging from simple (Poisson’s 
equation on a rectangle) to quite difficult (a turbulent flame simulation). In each case, 
the theory has been very close to sharp in predicting what happens to the residual 
norm on the next finer level. Hence, we can conclude that this theory is useful in real 
computing situations in which level changes occur adaptively and standard theoretical 
models do not apply. 


REFERENCES 


[1] Douglas, C. C. and Douglas, J., A unified convergence theory for abstract multigrid 
or multilevel algorithms, serial and parallel, SIAM J. Numer. Anal., 30:136-158, 
1993. 

[2] Bank, R. E. and Douglas, C. C., Sharp estimates for multigrid rates of convergence 
with general smoothing and acceleration, SIAM J. Numer. Anal, 22:617-633, 
1985. 

[3] Douglas, C. C., Multi-grid algorithms with applications to elliptic boundary- value 
problems, SIAM J. Numer. Anal., 21:236-254, 1984. 

[4] Southwell, R. V., Relaxation Methods in Enqineerinq Science , Oxford University 
Press, Oxford, 1940. 

[5] Douglas, C. C., Douglas, J., and Fyfe, D. E., A multigrid unified theory for 
non-nested grids and/or quadrature, E. W. J. Numer. Math., 2:285-294, 1994. 

[6] Douglas, C. C., Implementing abstract multigrid or multilevel methods, in Melson, 
N. D., Manteuffel, T. A., and McCormick, S. F., editors, Sixth Copper Mountain 
Conference on Multigrid Methods , volume CP 3224, pp. 127-141, Hampton, VA, 
1993, NASA. 


264 



THE EFFECTS OF DISSIPATION AND COARSE GRID 
RESOLUTION FOR MULTIGRID IN FLOW PROBLEMS 


Peter Eliasson 

FFA, Aeronautical Research Institute 
Bromma, Sweden 


Bjorn Engquist* 

UCLA and Royal Institute of Technology 
Stockholm, Sweden 


SUMMARY 


The objective of this paper is to investigate the effects of the numerical dissipa- 
tion and the resolution of the solution on coarser grids for multigrid with the Euler 
equation approximations. The convergence is accomplished by multi-stage explicit 
time-stepping to steady state accelerated by FAS multigrid. 

A theoretical investigation is carried out for linear hyperbolic equations in one 
and two dimensions. The spectra reveals that for stability and hence robustness of 
spatial discretizations with a small amount of numerical dissipation the grid transfer 
operators have to be accurate enough and the smoother of low temporal accuracy. 

Numerical results give grid independent convergence in one dimension. For two- 
dimensional problems with a small amount of numerical dissipation, however, only 
a few grid levels contribute to an increased speed of convergence. This is explained 
by the small numerical dissipation leading to dispersion. Increasing the mesh density 
and hence making the problem over resolved increases the number of mesh levels 
contributing to an increased speed of convergence. If the steady state equations are 
elliptic, all grid levels contribute to the convergence regardless of the mesh density. 


* Research sponsored by ARPA/ONR URI grant N00014-92-J-1890 and NSF DMS94-04942 



INTRODUCTION 


Multigrid methods have for a number of years been used to accelerate the conver- 
gence of the numerical solution to flow problems. This technique has been success- 
fully applied to both subsonic and transonic speeds [6], [9]; however, in the hypersonic 
regime multigrid is sometimes less robust [12]. 

The objective in this paper is to investigate the convergence properties of primarily 
the Euler equations. The effects of the numerical dissipation and the resolution of 
the solution on coarser grids are two areas of concern, see e.g. [4]. It is also intended 
to investigate if multigrid in practice can give grid independent convergence [7] or, 
if not, how many grid levels contribute to an increased speed of convergence. The 
influence on the robustness and stability of the grid transfer operators (the restriction 
and prolongation) and the smoother are also addressed and investigated. 

To analyze the solution of a hyperbolic system of equations linear scalar equa- 
tions are studied. Central and upwind spatial discretizations are considered and the 
equations are integrated in time by an explicit multistage Runge-Kutta scheme which 
serves as a smoother. The damping properties are investigated for a number of differ- 
ent discretizations in space and time and for different restrictions and prolongations. 

Numerical experiments are performed for two linear sets of equations in two di- 
mensions that are hyperbolic in time. The steady state equations are hyperbolic and 
elliptic. Numerical results are also presented for a transonic and a hypersonic case 
solving the Euler equations. This paper is part of a doctoral thesis [3]. 

THE MULTIGRID METHOD 


The FAS Multigrid Method 


In the multigrid method several coarser grids are introduced by eliminating every 
other point on a finer grid. Assume that L grids are used. Each level in the multigrid 
method with a given grid is called a grid level. Denote the current grid level by l 
when (1 < l < L ), where l — L is the finest grid and l = 1 is the coarsest grid. Let 
the finest grid L consist of N cells and hence the coarsest grid of N/ 2 L ~ 1 cells. The 
FAS (Full Approximating Storage) multigrid algorithm by Brandt [1] for solving the 
problem 

Livi = fi ( 1 ) 
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can then be formulated as in [5]: 


procedureFA5'(/, v, /); 


if 

(. 1 = 1) then 


else 

e 

ii 

35 

TT 

is 

Smoothing on coarsest 


v := S(v,f,ui) 

Pre — smoothing 


w := r\~ l * v 

Restriction 


d := Li-y{w) - r\~ l * ( L t (v ) - /) 

Defect 


w := w 

Initial guess 


for i := 1(1)7 do FAS(l — 1, to, d ) 

Recursive call 


v := v + p\_ x * (w — w) 

Coarse grid correction 

end: 

v := S{v, f,v 2 ) 

Post — smoothing 


( 2 ) 


where S is the Runge-Kutta smoother, r\~ l is the restriction from the finer grid level l 
to the coarser level l — 1 , and p\_ x is the prolongation from level l — 1 to l. A sawtooth 
cycle is considered with one pre-smoothing and no post-smoothing in the analysis, 
i.e. 7 = 1, ui = 1, u 2 = 0, v 3 = 1. The algorithm (2) results in an iteration matrix 
Mi defined as 


Level l > 1 : M, = {I - p\_ x (I - Mj_i ) Ljf x r\~ 1 Li)Si 
Level 1 : M 1 = Si 


The Grid Transfer Operators 


Central operators for the prolongation and restriction are considered with the 
unknowns in the cell centers. The prolongation is denoted 

a i = (4) 

and the simplest prolongation in one dimension is the piecewise constant injection 
illustrated in Figure 1 where 

®2j— i — bj , o,2j — bj (5) 

which is of order m p = 1 , i.e. it interpolates a polynomial of degree m p — 1 exactly. 
We consider also a more accurate prolongation of degree m p = 2 that interpolates a 
linear equation exactly 

a 2 j = ~^(3bj + bj + 1 ), a 2 j~ i = -(3 bj + 6y_i ) (6) 

For the restriction operators the transpose of the prolongation operators are used 

= \(pU) t (7) 
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Figure 1: Prolongation from fine to coarse grid. 


resulting in 


f |( a 2j-i + «2 j) 

\ g (®2i— 2 + 3a2j_l + 3a2j + a 2j+l) 


, m r = 1 
, m r = 2 


In [5] it is stated that the following condition must be fulfilled: 


m r + m p > 2m 


( 8 ) 

( 9 ) 


where 2m is the order of the differential operator. For convection problems, the 
Euler equations, and the model equations considered here, 2m = 1, i.e. the piecewise 
constant prolongation ( m p = 1) and the restriction (m r = 1) can in theory be used. 
It will turn out, though, that interpolation of higher degree of accuracy is stabilizing. 


The Smoother 


Explicit Runge-Kutta time stepping is used as a smoother in the multigrid cycle 
to accelerate the convergence to steady state. To solve for the steady state equation 
L{v) = 0, v n is iterated in time using an m-stage Runge-Kutta scheme defined as 

v {0) = v n 

v (i.) _ v (o) _ 0 , 1 AtL(n^) 

v (m) = v (o) _ am AtL{v^ m -^) 

v n+l = v (m) ( 10 ) 

where the coefficients a;, i = [1, 2, . . . , m], are chosen to make the smoothing efficient. 

DAMPING PROPERTIES IN ID 
A Scalar ID Test Problem 

To analyze the behavior of a hyperbolic system of conservation laws a simpler 
scalar one-dimensional test equation is considered: 
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( 11 ) 


du 

dt 


du 

dx 


u(x, 0) = 


0 ; 

U 0 (z) 


t > 0, 0 < x < 2tt 


with periodic boundary conditions for the Fourier analysis below. 
The semidiscrete form of (11) can be written as 


where 


it 1 ' 1 + h [f i+h 


fli)= 0 


(12) 


/,* 


j+r 


+ l b+i) - + 


k^A 3 


T+ 


represents a cell face flux. A is a central difference operator. Q = 1 results in a 
first order accurate upwind scheme; k ^ = 0 for that scheme usually. Second order 
accurate upwind schemes can be obtained by using limiters resulting in a non-linear 
scheme. For the analysis, Q is considered to be a constant in order to have a linear 
scheme. A central difference scheme with artificial dissipation is obtained when Q = 0 
and is a small positive constant. 


Damping of Smooth Waves 


To study the damping properties of the multigrid cycle in (2), a Fourier transform 
of the iteration matrix Mi is considered denoted Mi. By coupling frequencies pairwise 
between a finer and a coarser grid the transformed iteration matrix Mi becomes 
a block diagonal matrix with 2 ,_1 x 2 ,_1 matrices on the diagonal. The damping 
properties are investigated by calculating the eigenvalues to M/. 

The high frequency errors are damped by the smoother, the Runge-Kutta scheme. 
The low frequencies (or the smooth waves) are not damped very well, but on the other 
hand it has been shown [8] that the smooth waves increase their speed using multigrid 
by a factor of 2 l — 1 for the sawtooth cycle. Under some general conditions [3] it is 
possible to derive an expression for the largest eigenvalue to the transformed iteration 
matrix Mi for smooth waves : 

| Aj| = 1 — K,) 2 (C,(ft - j) + A,^j + 0(f) (13) 

provided that the frequency & = c ohi is small enough, where u is the wave number 
and hi is the constant cell length on the finest level 1. a is the CFL number, /? 2 is the 
constant for the square term in the Runge-Kutta polynomial p(z) = 1 + z + (3i z% 
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to an m-stage Runge-Kutta scheme. For a consistent Runge-Kutta scheme /3 2 = a m - 1 
in (10). Qi is the constant Q in (11) on grid level l. C\ and A; are two constants: 

A, = 2'-l C, = All (14) 

that grow exponentially with the number of grid levels. 

It is clear from (14) that f3 2 should be chosen f3 2 > 0.5 for good damping which is 
the same as requiring the Runge-Kutta scheme to be first order accurate. It is also 
clear that the multigrid increases the damping due to the factors A; and Ci in (13). 
As could be expected most damping is obtained from a first order accurate upwind 
scheme where Qi = 1. It can also be seen that the numerical dissipation on coarser 
grids does not contribute to the damping of the smoothest waves. Only the term Qi 
on the finest grid appears in (13). This is true also for other values of i/ x , u 2 , u 3 in 
(2). Consequently, it is not possible to increase the damping of the lowest frequencies 
by using a scheme with more numerical dissipation on coarser grids. 

Even though the damping of smooth waves is increased by multigrid the propa- 
gation of smooth waves dominates over the damping, which is illustrated in Figure 2. 
A smooth wave is transported 90 iterations using one grid and 6 iterations with four 
grids using an upwind discretization Q = 1, = 0. The step length is h = A-. Since 

the speed of the smooth wave is increased by a factor of 15 the wave is transported 
the same distance in the one-grid and four-grid cases. A three stage Runge-Kutta 
method (/? 2 = 0.6) is used and CFL = a = 1. 



Figure 2: Propagation of a smooth wave to the right with the upwind scheme, Q = 1, 
a) 1 grid and 90 iterations, b) 4 grids and 6 iterations. 
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Eigenvalues to the Iteration Matrix 


To illustrate the damping properties for different spatial discretizations and grid 
transfer operators the absolute value of the eigenvalues to the Fourier transformed 
iteration matrix of the sawtooth cycle are plotted. For all cases a 5-stage Runge- 
Kutta scheme with the coefficients (0.0814,0.191,0.342,0.574, 1.) is used [3] which is 
first order accurate and provides optimal high frequency damping for both an upwind 
scheme (Q = 1) and a central scheme (k/ 4 ) = for a CFL just above 2. The region 
of stability for this scheme is shown in Figure 3. The scheme has the advantage 
that different spatial discretizations can be used on different grid levels. In Figure 4 



Figure 3: Region of stability and locus of differencing operator for an upwind (Q = 1, 
k( 4 ) = 0) and a central ( Q — 0, scheme with five stages, A|p(z)| = 0.1. 

the damping of a central scheme is shown using a piecewise constant prolongation 
(m p = 1) and its transpose for restriction ( m r = 1) with CFL = 2. The absolute 
values of the eigenvalues to the matrices Mi, M 2 , M 3 , and M 4 are shown as a function 
of the frequency where Mi , the single grid case, is represented with a solid line in all 
figures. The increased damping of the lowest frequencies can clearly be seen. 

Even though the central scheme in Figure 4 is stable for grid levels 1—4 with 
CFL = 2, divergence is obtained for lower CFL numbers. Figure 5 shows the same 
scheme but with CFL = 1.25. This scheme is obviously unstable using 3 and 4 grids. 
By increasing the accuracy of the prolongation and/or the restriction the scheme is 
stabilized. The central scheme with is a scheme with a rather small amount 

lb 

of dissipation. One could also use the dissipative first order upwind scheme on the 
coarser grids to stabilize. 
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0.52 £ 1.05 


Figure 5: Eigenvalues of a) M 2 , b) M3, c) M4. M\ is the solid line. Central scheme 
is used, — jq, Q = 0, CFL = 1.25, m p = 1, m r = 1. 






The spectra for the central scheme in Figure 5 are plotted in Figure 6 with different 
levels of accuracy for the prolongation and restriction. The eigenvalues outside the 
region of stability can clearly be seen for the low order accurate grid transfer operators; 
the scheme is stabilized and the spectra brought closer to the single grid spectra 
as more accurate grid transfer operators are used. Figure 7 shows the spectra for 
an upwind method where the other conditions are the same as in Figure 6. This 
dissipative scheme is stable for all prolongations, restrictions, and single grid stable 
CFL numbers. 



1 grid 

2 grids 

3 grids 

4 grids 


Figure 6: Spectra for single-grid, two-grid, three-grid and four-grid multigrid, = 
jg, Q'= 0, CFL = 1.25. a) m p = 1, m r = 1, b) m p = 2, m r = 1, c) m p = 2, m r — 2. 

The following can be established. A certain amount of dissipation has to be 
introduced to the system. A large amount of numerical dissipation can be used, 
e.g., by choosing a first order accurate upwind scheme, which stabilizes the multigrid 
iterations. If a good converged solution is desired, however, the amount of numerical 
dissipation on the finest grid has to be rather small to avoid smearing the solution. 
The multigrid cycle is then stabilized by increasing the accuracy of the grid transfer 
operators, by choosing a Runge-Kutta scheme with a low order of accuracy and good 
high frequency damping, and by doing several Runge-Kutta sweeps on coarser grids. 

DAMPING PROPERTIES IN 2D 
A Scalar 2D Test Problem 


The ID hyperbolic scalar equation in (11) is extended to 2D as: 
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Figure 7: Spectra for single-grid, two-grid, three-grid and four-grid multigrid, = 
0, Q = 1, CFL = 1.25. a) m p = 1, m r = 1, b) m v = 2, m r = 1, c) m p = 2, m r = 2. 


du du 

dt ^ dx^ a dy 


= 0, t > 0, 0 < x, y < 2n 


(15) 


where a is some constant. The discretization of (15) follows the ID counterpart; a 
similar Fourier analysis can be made assuming periodic boundary conditions with the 
prolongation and restriction extended in a straightforward manner to 2D. 


Damping of Smooth Waves 

Some important observations can be made from a Taylor expansion of small fre- 
quencies of the largest eigenvalue to the iteration matrix. In 2D it is possible for the 
Fourier transformed exact operator i(£ x + a( y ) to vanish or to become very small. 
£ x ,£ y are the frequencies in x,y. For a two-grid problem where the problem is solved 
exactly on the coarse grid and the exact operator vanishes this implies that the two- 
grid algorithm can only reduce the residual by a factor of | for the first order upwind 
scheme ( Q = 1), even though the problem is solved exactly on the coarse grid. The 
situation is even worse for the central scheme which is only reduced by a factor of |. 
This is fundamentally different than in ID where the residual is reduced 0(£), 

This observation has also been made by Decker &: Turkel [2]. They point out that 
a fourth difference dissipation on the fine grid and a second difference dissipation on 
the coarser grid can lead to a situation with practically no damping. This means that 
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the convergence rate per multigrid cycle cannot be made arbitrarily small. Since the 
damping cannot be made arbitrarily small for a two-grid method with an exact solu- 
tion on the coarsest grid, there is consequently no use in making too many smoothing 
sweeps in 2D multigrid on coarser grids. 

Similar observations have been made by Mulder [10], [11]. Mulder notices that 
when the exact operator vanishes the discretization is of the order of the truncation 
error. The order of accuracy of the restriction and/or prolongation must then be 
increased. The equation (9) is no longer valid since one is actually looking at the 
truncation error which can be viewed as a discretization of a higher order differential 
equation with a value 2m > 1 for these waves. Mulder shows that the worst-rate 
convergence can be estimated to 1 — 2~ p where p is the spatial order of accuracy. 
This agrees with what is found above; for a first order scheme (p = 1) this is |. 
When the exact operator vanishes and the remaining operator is the fourth difference 
operator a third order accurate scheme is obtained which corresponds to 1 — 2~ 3 = |. 

Both Decker & Turkel and Mulder conclude that the above estimates are too 
pessimistic since they are worst case estimates. In real applications with non-periodic 
boundary conditions, better rates of convergence are usually obtained. 

NUMERICAL EXPERIMENTS 

The 2D equation (15) is used for numerical experiments: 

0; t > 0, 0 < x,y < 1 

0 (16) 
u 0 (y) = sin(27r my) 
u(x , 1, t) 

where a > 0 and m > 0 is an integer that determines the number of wave lengths 
along the y-axis. The sine wave along the y-axis will propagate into the domain along 
characteristics in a direction that depends on a. The exact steady state solution to 
(16) is u = u 0 (y — ax). The convergence of the ID equation (11) gives almost grid 
independent results as the grid is refined [3] and is therefore omitted here. 

The idea with this test case was to see how the convergence is influenced as the 
number of grids increases. More specifically, what happens when the sine wave in 
(16) is poorly or not at all resolved on coarser grids? How many grids can be used 
and how does the dissipation influence the rate of convergence? 

Figure 8 shows the converged solutions on the different grids at y = 0 for a central 
scheme where a = 0.5, m = 4, /U 4 ) = an d Q — o. The converged solution is well 
represented on the three finest grids; the deviation from the exact solution then grows 
as the size of the grid is reduced. 

In Figure 9 the rate of convergence is plotted for the same central scheme for grid 
levels 1 — 5. A linear prolongation ( m p = 2) is used with a lower order restriction 


du du du 

Tt Jr Tx + a ai 
u (x, y, 0) 
u (0, y, t) 
u(x , 0 , t ) 
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Figure 8: Converged solution for the 2D scalar equation at y = 0 for the central 
scheme k W = Q — 0. Constants a = 0.5, m = 4. 

(m r = 1). This case does not converge if m v = 1 even though the Fourier analysis 
in Figure 4 gives stable eigenvalues. The CFL number is CFL = 2 using the five- 
stage scheme previously mentioned; sawtooth cycles are used. The convergence rates 
are shown as the fine grid size is increased from iV = 64toiV = 512 cells in each 
direction. The convergence is very slow until the low frequency errors are pushed out 
of the computational domain. ^From then on the high frequency errors that are left in 
the domain are effectively damped by the smoother leading to a fast convergence. For 
the finest grid the best convergence rate is achieved using three grid levels. However, 
kinks on the curves of convergence for the four-grid and five-grid cases occur after a 
while, which make these cases require more iterations than the three-grid case. Notice 
also that multigrid gives almost no speedup for the coarser grid. 

Further increase in the accuracy of the grid transfer operators has a very small 
influence on the convergence rate. However, if the dissipation is increased on the 
finest grid as shown in Figure 10, the kinks become smaller and almost vanish for the 
first order upwind scheme where all grid levels contribute to an increased speed of 
convergence. The solution for this dissipative scheme is poorly, resolved, though, on 
all grids [3]. 

If the residuals are to be brought to machine accuracy, the amount of dissipation 
is very important to gain from several grid levels in multigrid. If, on the other hand, 
the iterations are to be interrupted when the error reaches the level of truncation 
error, then almost all grid levels will contribute to the convergence [3]. 

For a hyperbolic linear system of equations where the steady state equations are 
elliptic the situation is different. Equations (17) are hyperbolic in time but converge 
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200. Iter. 400 . 


Figure 9: Rate of convergence with the central scheme, k ^ = ^, Q = 0, a = 0.5, 
m = 4, CFL = 2.0. m p = 2, m r = 1. a.) N = 64. b) N = 128. c) iV = 256. d) 
iV = 512. 





□ lg 
O 2 grids 
. A 3 grids 
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200. Iter. 400. 


Figure 10: Rate of convergence, a =.0.5, m = 1, CFL = 2.0, m p = 2, m r = 2, 
AT = 512. a) Central, k ( 4) = 0.01. b) Central, ac (4) = 0.0625. c) Central, /c (4) = 0.2. 
(CFL = 1.5). d) Upwind, <3 = 1. 



to the elliptic Laplace equation. The equations are solved in the same way as the 2D 
scalar. As can be seen from Figure 11 there is a speedup from all grid levels. 


a/uA n o 

atUi lo -i 


fuA , (0 = 

\u 2 J \1 0/ dy \U 2 J 

U!(x,y,0) = u 2 (x,y, 0) = 0 
ui(0,y,t) = sin(27rmy) 
u 2 (l,y,t) = 0 
ui(x, 0, t ) = ui(x, 1, t) 
u 2 (x, 0,t) = u 2 (x, l,t) 


t> 0, 0 < x,y < 1 


(17) 



Figure 11: Rate of convergence with 2D elliptic equation, m = 4. Five stage Runge- 
Kutta, CFL = 2.0, N = 256. a) central scheme = A Q = 0, m p = 2, m r = 1. 

b) upwind scheme /c^ = 0, Q J+ i = 1, = a ? m v = l, rn r — 1. 


For all cases the resolution of the solution on coarser grids appears to be of small 
practical importance. The difference between the solutions on coarser and finer grids 
lies in the high frequencies damped by the smoother. If a large amount of numerical 
dissipation for the hyperbolic steady state problem is used, the convergence rates 
are similar to the ones of the elliptic steady state problem. The high frequencies 
are well damped and all grid levels contribute to an increased speed of convergence 
even though the solution is poorly resolved on coarser grids. If a smaller amount 
of numerical dissipation is used, however, the solution itself is better represented on 
coarser grids. There is still a small difference in the solutions on finer and coarser grids. 
The difference lies in the intermediate frequencies that are not damped very well by 
the smoother. This difference and the fact that the exact operator can vanish cause 
the convergence with multigrid to deteriorate for hyperbolic steady state problems. 
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Convergence close to being grid independent is only obtained in one dimension and 
for the elliptic steady state problem. 

Finally some results for the Euler equations are presented. In Figure 12 the rate 
of convergence is plotted for a 2D transonic calculation over a NACA 0012 airfoil at 
a Mach number of 0.8 and an angle of attack a = 1.25°. The steady state equations 
are mixed hyperbolic/elliptic in the dominating subsonic region, and the convergence 
resembles, to a large extent, the linear elliptic steady state case, above which all grid 
levels contribute to the convergence. 



Figure 12: Rate of convergence over the NACA 0012 airfoil using central scheme, 
= 0.25, = jq. Five stage Runge-Kutta with CFL = 2.0. m p = 1, m r = 2. a) 

Fine grid 65 x 17. b) Fine grid 129 x 33. c) Fine grid 257 x 65. 


Figure 13 shows the rate of convergence for a hypersonic hyperboloid-flare problem 
at a Mach number of 8.7 [3]. This problem is axisymmetric and represents the nose of 
a space shuttle. The flow is supersonic almost everywhere. As can be seen the finest 
grid has to be fine enough to gain from multigrid in accordance with the hyperbolic 
steady state problem above. 
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Figure 13: Rate of convergence for the hyperboloid flare. Five stage Runge-Kutta 
with CFL = 1.5. Central scheme with kW = 1.0, = 0.0625, m p = 2, m r = 2. a) 

finest grid 33 x 17. b) finest grid 65 x 33. a) finest grid 129 x 65. 


CONCLUSION 

The objective was to investigate the influence of the numerical dissipation and the 
resolution of the solution on coarser grids for flow problems with multigrid. 

If a low amount of numerical dissipation is used on the fine grid the multigrid 
cycle is stabilized by increasing the accuracy of the grid transfer operators, by using 
a Runge-Kutta scheme with a low order of accuracy and/or by adding more numer- 
ical dissipation on coarser grids. For a higher amount of numerical dissipation, the 
multigrid cycle is stable in any case. 

Numerical results for model problems give grid independent convergence only in 
one dimension and for the 2D elliptic steady state problem. For the 2D hyperbolic 
steady state problem with moderate numerical dissipation the convergence is grid 
independent down to the level of truncation error but deteriorates in multigrid when 
converged further. Only a few grid levels where the solution is over resolved contribute 
to an increased speed of convergence. This is explained by the small numerical dissipa- 
tion leading to dispersion and a vanishing exact operator. The convergence behavior 
for the Euler equations was similar to that of the model problems. 
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MULTIGRID AND KRYLOV SUBSPACE METHODS FOR THE 
DISCRETE STOKES EQUATIONS 

HOWARD C. ELMAN* 


Abstract. Discretization of the Stokes equations produces a symmetric indefinite system of lin- 
ear equations. For stable discretizations, a variety of numerical methods have been proposed that 
have rates of convergence independent of the mesh size used in the discretization. In this paper, we 
compare the performance of four such methods: variants of the Uzawa, preconditioned conjugate gra- 
dient, preconditioned conjugate residual, and multigrid methods, for solving several two-dimensional 
model problems. The results indicate that where it is applicable, multigrid with smoothing based on 
incomplete factorization is more efficient than the other methods, but typically by no more than a 
factor of two. The conjugate residual method has the advantage of being both independent of iteration 
parameters and widely applicable. 

Key words. Stokes, multigrid, Krylov subspace, conjugate gradient, conjugate residual, Uzawa 


1. Introduction. Consider the system of partial differential equations 


( 1 ) 


-An + Vp = / 
— div u = 0 


u — 0 

faP = 0 


on 12 
on dfl, 


where 12 is a simply connected bounded domain in R d , d = 2 or 3. This system, the 
Stokes equations, is a fundamental problem arising in computational fluid dynamics; 
see, e.g., [7, 12, 14, 17]; u is the d-dimensional velocity vector defined on 11, and p 
represents pressure. 

Discretization of (1) by finite difference or finite element techniques leads to a 
linear system of equations of the form 


( 2 ) 


( A B t 
\ B -C 



where A is a set of uncoupled discrete Laplacian operators and C is a positive semidef- 
inite matrix. We consider here only stable discretizations, i.e., those for which the 
condition number of the Schur complement matrix BA~*B T + C is bounded indepen- 
dently of the mesh size used in the discretization. For finite element discretizations 
with C — 0, this is a consequence of the inf-sup condition and upper bound 


7 < inf sup 
<3 v 


(q, div v) 

MilMlo ’ 


Ifodivp)! < r 
Milkllo - ’ 


where 7 and T are independent of the mesh size. Here, | • |i and || • ||o denote the 
H 1 seminorm and L 2 norm, respectively, on the discrete velocity and pressure spaces, 
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and the bounds are taken over all v and q in the appropriate discrete spaces; see 
[7, 12, 14, 17]. 

In recent years, a variety of iterative algorithms have been devised for solving the 
discrete Stokes equations. In this paper, we compare the performance of four such 
methods: 

1. a variant of the Uzawa method; 

2. a preconditioned conjugate gradient (PCG) method applied to a transformed 
version of (2); 

3. a preconditioned conjugate residual (PCR) method; 

4. multigrid (MG). 

The Uzawa method is the first among these to have been devised [2] and it is often 
advocated as an efficient solution technique, see e.g. [7, 12, 14], The convergence 
factor associated with it is proportional to (k — 1)/(k + 1) where k is the condition 
number of the Schur complement BA~ l B T + C (see §2.5). The conjugate gradient 
method, developed by Bramble and Pasciak [5], has a convergence factor propor- 
tional to (y/n — 1)/(\/k + 1) but a larger cost per step than the Uzawa method. The 
preconditioned conjugate residual method was developed by Rusten and Winther [24] , 
Silvester and Wathen [26], and Wathen and Silvester [31], and its convergence behavior 
is deter mi ned by properties of the indefinite matrix. For multigrid, we consider ver- 
sions derived from two smoothing strategies: a variant of the distributed Gauss-Seidel 
method of Brandt and Dinar [6], and the technique based on incomplete factorization 
developed by Wittum [35]; we refer to these as MG/DGS and MG/ILU, respectively. 

These methods all have the property that for appropriate choice of precondition- 
ers (or for multigrid, smoothers), their convergence rates are independent of the mesh 
size used in the discretization. The actual costs of using them depends on both the 
convergence rate and the cost per iteration. Our goal in this paper is to compare costs, 
in operation counts, of using each of the methods to solve three discrete versions of (1). 
For convergence to be independent of mesh size, the first three methods ( Krylov sub- 
space methods ) require a preconditioning operator spectrally equivalent to the discrete 
Laplacian. In an effort to unify the comparison of these ideas with multigrid, we also 
implement this preconditioner using a multigrid method for the associated Poisson 
equation. (Thus, the Krylov subspace methods can themselves be viewed as variants 
of multigrid.) Our main conclusions are as follows. For problems where it is applicable, 
one version of multigrid, using incomplete, factorization, requires the fewest iterations 
and operations, but it is only marginally faster, i.e., by factors of approximately 1.5 to 
2, than the Krylov subspace methods and the distributed Gauss-Seidel method. The 
Krylov subspace methods are more widely applicable than either multigrid method. 
Among the Krylov subspace methods, the conjugate residual method is slightly slower 
than the conjugate gradient method and in some cases the Uzawa method, but it has 
the advantage of not requiring any parameter estimates. 

An outline of the rest of the paper is as follows. In §2, we present the solution 
algorithms and give an overview of their convergence properties. In §3, we specify four 
benchmark problems and the computational costs per iteration of each of the solution 
methods. In §4, we present the numerical comparison. 

2. Overview of methods. In this section, we present the four algorithms un- 
der consideration and outline their convergence properties. The first three methods 
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depend on a preconditioning operator Q a that approximates the matrix A of (2). We 
assume that Qa is symmetric positive definite (SPD) and that 


( 3 ) 


Vi < 


(v,Av) 

(v,Q a v) ~ V2 ’ 


where rft and % are independent of the mesh size used in the discretization. In 
addition, finite element discretizations of (1) have a mass matrix M associated with 
the pressure discretization. 1 The preconditioner will also include a SPD approximation 
Qm of M. Discussions of computational costs will be made in terms of various matrix 
operations together with inner products and “axpy’s,” i.e., vector operations of the 
form y <— ax + y. 

2.1. The inexact- Uzawa method. We use the following “inexact” version of 
the Uzawa algorithm [11], which starts with u 0 = 0 and an arbitrary initial guess p 0 : 


( 4 ) 


for i = 0 until convergence, do 
Ui+ 1 = Ui + Q^if ~ (Aui + B T pi )) 
Pi+i = Pi + a QjJ(Bu i+ 1 - Cpi) 

enddo 


Here, a is a scalar parameter that must be determined prior to the iteration. 

In the “exact” version of this algorithm, Q A = A and the first step is equivalent 
to solving the linear system Au i+1 = / - B T pi. When Qm = I, the exact algorithm 
is then a fixed parameter first order Richardson iteration applied to the Schur com- 
plement system (BA -1 B T + C)p = BA~ X /; Qm is a preconditioner for this iteration. 
The inexact Uzawa algorithm (4) replaces the exact computation of A _1 (f — B T pQ 
with an approximation. 


2.2. A preconditioned conjugate gradient method. Let A denote the co- 
efficient matrix of (2). Premultiplication of (2) by the matrix 


T = 




produces the equivalent system 


( 5 ) 


/ QfA Q^B t 

\BQ~ a 1 A-B BQ a 1 B t + C 




Let M = T A denote the coefficient matrix of this system. The conjugate gradient 
method (CG) developed in [5] requires that the bilinear form 


( 6 ) 



((A - Qa)vi,v 2 ) + (<?i,q 2 ) 


1 If the finite element solution is expressed using a given basis {0,} as p = ]T\ 8,4> t , then ||p||r 2 = 
(6,MS) 1/2 . 


285 



define an inner product. Equivalently, the preconditioning operator Qa must satisfy 
(3) with rji > 1. It is shown in [5] that M is SPD with respect to the inner product 
(6), so that CG in this inner product is applicable. The matrix 


(?) 


G = 


I 

0 



is also SPD with respect to (6), so that this can be used as a preconditioner. 
Let 



/ - ( Au 0 + B T p 0 ) \ 
~(Bu 0 - Cp 0 ) J 


denote an arbitrary guess for the solution and the associated residual. An implemen- 
tation of PCG is given below. Except for the nonstandard inner product, it is the 
standard implementation, as given for example in [15, p. 529]. It is more efficient than 
the version given in [5]. The preconditioner Qa is implicitly incorporated into the 
inner product. The use of a preconditioner (7) is new. 


Ro — TRq, Ro — G 1 Ro 
Po = Ro, M Pq = TAPo 

4 n) = [R 0 , Ro}, a ( 0 d) = [P 0 , MP 0 ], a 0 = a ( 0 n) /a ( 0 d) 
Xt=X 0 + a 0 P 0 

R\ = Ro — cxqAPo, R\ — Ro — ctoMPo, Ri = G 1 ^i 


for i 




1 until convergence, do 

= [Ri, H P ( A = Pi - 1 = 

jRi + A-i Pi-i, MPi = TAPi 


aj n) = f3£\, af = [Pi, MPi\, a, = a^/af 1 

Xi+i = X{ + cur, Ri+i = Ri — cxiAPi 

Ri- (-1 — Ri OtiAi-Pi, Ri-\-l — G Ri-\-\ 


enddo 


To help identify operation counts, we describe the computation of {ct;} and {/3;} 
in more detail. Letting 


Ri = 





1 


we have 

( 8 ) 


= [Ri, Ri] = ( fi , Afi - ri) + ( Si , Si); similarly, if 
/ \ / 

Vi 


Cf 

di 


, APi = 


Wi 


, MPi 


Q/ V i 

BQ^Vi - wi 


•) 


then af 1 = [P t , MPi ] = (c,, AQ/v, - vQ + (di, BQ/v, - w l ). Q A is referenced only in 
the construction of Q~^v in (8), so that only the action of the inverse of Qa is required. 
Moreover, although the vectors Afi, Aci (for V{) and AQ~^V{ are used, the first two of 
these can be computed using an AXPY. Consequently, only one matrix-vector product 
by A is needed. 
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2.3. The preconditioned conjugate residual method. Since A is symmet- 
ric, variants of the conjugate residual method are applicable. Let X 0 denote the initial 
guess and R 0 its residual. The following algorithm implements the Orthomin version 
of PCR with preconditioner Q [3] : 2 

Ro — Q x Ro, Po = Ro, So = Qr x APo 

4 n) = (Ro,AP 0 ), a ( 0 d) = (AP 0 ,So), a 0 = 4 n) /a ( 0 d) 

X\ — Xq + olqPq., R\ = Ro — Q-oAPo i R\ — Ro ~ cxoSo 
for i = 1 until convergence, do 

p£\ = -(^Ri, s^), 4 d \ = 4 d 4 

Pi = R{ + A-i-Pi-i 5 APi = ARi + /3i-iAPi-i, S{ — Q~ x APi 
«! n) = (Ri, APi), af = (APi, Si), Oi = a\ n) /af 
X { -j- 1 X. i -j- CX-iPi , R-i-^1 — Ri OLiAPi, | — Ri OLiSi 

enddo 


Any symmetric positive-definite Q could be used as a preconditioner. As in [26], we 
use 


Q = 


Qa o A 
0 Qm J 


2.4. Multigrid. As is well known, multigrid methods combine iterative methods 
to smooth the error with correction derived from a coarse grid computation. We use V- 
cycle multigrid for “transformed systems.” Our description follows [34, 35]. Compare 
[22, 30] for other multigrid methods derived from the squared system associated with 
( 2 ). 

Let —A p denote the Laplace operator defined on the pressure space, with Neumann 
boundary conditions (see [16]), and let A v be a discrete approximation to — A p defined 
on the pressure grid. Consider the following transformed version of (2): 


( 9 ) 


(A B t \ (I B t \ 
{ B ~C ){0-A p ) 



The coefficient matrix in (9) is 



( 10 ) 


A = 


f A W) 

{ B G) 



I B t ' 
0 -A„ 



where W = AB T — B T A p and G = BB T + CA p . For appropriate discretizations of (1) 
(see §3), W is of low rank, with nonzero entries only in rows corresponding to mesh 
points next to dQ,. When C = 0, G can also be viewed as discretization of — A p . The 
splitting 

( 11 ) A = s-n 

2 It is possible for this version of PCR to break down, with a,- = 0. The Orthodir version, 
which uses a three-term recurrence to generate Pi, is guaranteed not to break down; it requires two 
additional axpy’s. Our implementation switches from the Orthomin to Orthodir direction update 
if |c*i| < 10~ 4 , as described in [9], In the experiments discussed in §4, this switch never took place. 
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then induces a stationary iteration applicable to (2), 


( 12 ) 


( u k+ i 

\ Pk + 1 


u k \ ( I B T \ x ( f- ( Au k + B T p k ) \ 
p k ) \0 ~A P J y ~(Bu k - Cp k ) ) ' 


This is used as the smoother for the multigrid solver for (2). Specific choices for S are 
given in §3.2. 

Let R u denote a restriction operator mapping velocity vectors in the fine grid (of 
width h) to the coarse grid (of width 2 h), let R p similarly denote the restriction oper- 
ator for the discrete pressure space, and let P u and P p denote prolongation operators 
from the coarse spaces to the fine spaces. (For simplicity, we are omitting explicit 
mention of h in this notation.) One step of V-cycle multigrid for solving (2), starting 
with initial guess u°, p°, is as follows. 


(■u 1 ,^ 1 ) = MG (u°,p°, /,#,&!, k 2 ,h) 
if h < h 0 , then % Recursive call 

Starting with u°,p°, perform k\ smoothing steps (12), producing u 1 / 3 ,^ 1 / 3 
r 1 / 3 = / — {Au 1 ! 3 + B T p - 1 / 3 ), s 1 / 3 = —(Bu 1 / 3 — Cp 1 ! 3 ) 

rV 3 = R u r 1 / 3 , si' 3 = RpS 1 ! 3 

( uj 3 , p 2 J 3 ) = MG(0, 0, rl 13 , sV 3 , k lt k 2 , 2 h) 
u 2/ 3 = v}! 3 + P u uJ 3 , p 2 / 3 = p 1 ! 3 + PpipJ 3 

Starting with w 2 / 3 ,p 2 / 3 , perform k 2 smoothing steps (12), producing u 1 ,^ 1 
else % Coarse grid solve when h = ho 

directly 



We also use V-cycle multigrid derived from the discrete Laplacian as a preconditioner 
to approximate the action of A~ l for the Krylov subspace methods; this is defined 
analogously and we omit the details. For all multigrid methods, we use bilinear inter- 
polation to define P u and P p , and R u = Pj , R p = Pj . The discrete operators at each 
level are derived from the discretization on the associated grid. 


2.5. Convergence properties. We briefly outline some convergence properties 
of these methods; see the primary references for derivations of bounds. Each of the 
methods generates a sequence of iterates U{ « u, pi « p such that, if e, is a represen- 
tation of the error, then lim, doodle, •H/Heoll) 1 /* = P for some norm || • ||. We refer to p 
as the convergence factor. 

We are assuming that the discretization and choice of Qm are such that 


(13) 


, „ {q,(BA-'B T + C)q) ^ , 

-n < / — ^ S ^2, 

(9,<5m9) 


where Ai and A 2 , and therefore, k = A 2 /Ai, are bounded independently of the mesh 
size of the discretization. This is the case, for example, when Qm is a suitable ap- 
proximation of the mass matrix in finite element discretization [29, 32]. Note that k 
is the spectral condition number of Q]^{BA~ l B T + C ). 
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The exact Uzawa algorithm has convergence factor p (i — a Q^{BA 1 B T + cfj 
[12]. This is smallest for the choice a = 2/(Ai + A 2 ), in which case it has the value 
(k — 1)/(k+ 1). Thus, the convergence factor for the Uzawa algorithm is independent 
of the mesh. It is shown in [11] that the performance of the inexact Uzawa algorithm 
is close to that of the exact one if the iterate satisfies 

(14) ||/ - B T pi - Au i+1 \\ 2 < r\\Bui - Cpi\\ Q -i , 

where r is independent of the mesh size. 

The PCG method is analyzed in [5, Theorem 1], where it is shown that the condi- 
tion number of the coefficient matrix M of (5) is bounded by a constant proportional 
to k. Thus, standard results for CG [15] imply that the bound on the convergence 
factor for this method -is proportional to (y/n — 1)/(-*/k + 1). The constant of pro- 
portionality depends on how close i)\ is to 772 in (3), i.e., how well Qa approximates 
A. 

The PCR method is analyzed in [24, 26]. The analysis shows that the eigenvalues 
of the preconditioned matrix Q~ X A are contained in two intervals [—a, —b] U [c, d], 
where a, b, c, are d are positive constants that are independent of the mesh size. The 
sizes of the intervals depend on k and the accuracy with which Qa approximates A. It 
follows from the convergence analysis of CR [9, 27] that the convergence factor for the 
preconditioned algorithm is independent of the mesh size. For example, it is shown 

[9] that if d — c = a — b > 0, then the convergence factor is bounded by 2 , 

where (3 = (be) /(ad). 

It is shown -in [36] that for finite difference discretization of (1) (see §3.1), two-grid 
variants of multigrid are convergent with convergence rate independent of the mesh 
size. The analysis applies to the ILU smoothing of §3.2, although it requires that the 
prolongation be based on biquadratic interpolation. In practice, bilinear interpolation 
has been observed to be sufficient [35]. Fourier analysis in [6] also suggests that 
MG/DGS has convergence rate independent of mesh size. 

Remark 2.1. Several other proposed methods share properties with the version of 
PCG under consideration. In particular, Verfiirth [29] has shown that PCG applied 
directly to the Schur complement system has convergence factor proportional to pcG', 
however, this method requires accurate computation of the action of A -1 at each CG 
step [23]. Bank, Welfert, and Yserentant [4] present a method making use of Qa ~ A 
with convergence rate dependent on the accuracy of this approximation, but using an 
additional inner iteration on the pressure space. 

3. Solution costs. In this section, we outline the computational costs required 
to solve three benchmark problems on fl = (0, 1) X (0, 1) for each of the solution 
methods of §2. 

3.1. Benchmark problems. We use four discretizations to produce test prob- 
lems: “marker and cell” finite differences and three mixed finite element strategies. 

1. Finite differences [19]. This consists of the usual five-point operator for each of 
the discrete Laplacian operators of (1), together with centered differences for the first 
derivatives Vp and div u. For the discretization to be stable, it is necessary to use 
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Fig. 1. Staggered grids for finite difference discretization. 

j— 0— |— 0— p 0— p 0— | 
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| — ® — | — ® — | — <S>^ — <S> — | • Pressure p 

X«X»X8XeX 
L (g _L 0 _L ® _1_ 0 J 


staggered grids in Cl. Figure 1 shows such grids on a mesh of width h = 1/4. In 
order to define the velocity discretizations at grid points next to dtt, certain values 
outside Cl must be extrapolated; for example, this is needed to approximate d 2 ui/dy 2 
for points “x” next to the bottom of dCl. 

2. Linear /constant finite elements. This choice consists of continuous piecewise linear 
velocities on a mesh of width h, and piecewise constant pressures on a mesh of width 
2 h. The discrete pressures are not required to be continuous. The coarser pressure 
grid ensures that the inf-sup condition holds [17]. We refer to this as the P\(h)Po(2h) 
discretization. 

3. Piecewise linear finite elements. Here, continuous piecewise linear velocities on 
a mesh of width h are paired with continuous piecewise linear pressures on a mesh 
of width 2 h. The inf-sup condition is also satisfied. We call this the P\{h)Pi(2h) 
discretization. 

4. Stabilized piecewise linear finite elements. A stable discretization using piecewise 
linear velocities and pressures on a single of mesh can be obtained using a stabilization 
matrix C — (3h 2 A n , where A n is the discrete Laplace operator defined on the pressure 
space, subject to Neumann boundary conditions [8]. This technique is equivalent to the 
mini-element discretization [1] after elimination of the internal degrees of freedom. We 
use (3 = .025, as recommended in [25]. We refer to this discretization as P\(h)Pi(h). 
The usual hat functions are used as the bases for linear velocities and pressures. 

The coefficient matrix A of (2) for all these problems, as well as B T , C, and 
BA~ x B t + C, are rank deficient by one; the latter three matrices share a constant 
null vector. As a result, the discrete pressure solutions are uniquely defined only up to 
a constant. In exact arithmetic, the solution methods under consideration correct the 
initial guess with quantities orthogonal to the null space of A, so that the component 
of the null space in the computed solution is the same as in the initial guess. For the 
analysis, the lower bound of (13) refers to the smallest nonzero eigenvalue. 

Note that our goal in considering these problems is to compare the performance of 
the different solution strategies on a variety of problems. We highlight some properties 
of each of the problems as follows: 

1. fini te differences, stable, ^(pressure unknowns) « ^(velocity grid points); 

2. finite elements, stable, discontinuous pressures, ^(pressure unknowns) « | 
^(velocity grid points); 

3. finite elements, stable, continuous pressures, ^(pressure unknowns) « | ^ve- 
locity grid points); 

4. fini te elements, requires stabilization, continuous pressures, ^(pressure un- 
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knowns) « ^(velocity grid points). 

We are not comparing the accuracy achieved by the discretizations, and remark only 
that the three finite element discretizations display the same asymptotic convergence 
rates. See [17, pp. 29, 50] for comments on accuracy of finite element discretization, 
and [21] for analysis of the finite difference scheme. 

3.2. Preconditioners and smoothers. The Uzawa, PCR, and PCG methods 
require choices of Q a and Qm- For all cases, Q a consists of one step of V-cycle 
multigrid derived from the discrete Laplacian. The smoothing is based on damped 
point- Jacobi iteration with damping parameter u> = 2/3 [20], which ensures that 
Qa is symmetric. For the three finite element discretizations, Qm is chosen to be the 
diagonal of the mass matrix M ; see [32]. (In the case of the Pi(/i)Po(2/i ) discretization, 
Qm = M.) Although there is no mass matrix for finite differences, a natural analogue 
in two dimensions is M = h 2 I, and this is used for Qm with finite differences. 

We consider two multigrid smoothing strategies. The first is a variant of the 
distributed Gauss-Seidel (DGS) iteration introduced by Brandt and Dinar [6]. The 
splitting operator of (11) is given by 


S = 


Sa 0 

B S G 




so that the smoother (12) has the form 

u k +i = Sa'U ~ (Au k + B T p k )) 

Pk + 1 = SQ 1 (-(B(u k + u k +i) + Cpk) 

u k + 1 = u k + ftfc+i + B T pk+ 1 

Pk + i = Pk ~ ApPk+\ • 


For S'a, we use the point Gauss-Seidel matrix derived from red- black ordering of 
the velocity grid. (That is, if A = D - L - U with the red-black ordering, then 
Sa = D — L .) For finite differences, Sg = (1 /w)T where T is the tridiagonal part 
of G and ui = 2/3; that is, Sg corresponds to a damped one-line Jacobi splitting. 
For Pi(h)Pi(h ) finite elements, Sg is the block Jacobi matrix derived from a two-line 
ordering of the underlying grid. These are slightly more sophisticated versions of the 
choice Sg = diag(G ) used in [6]. We refer to this multigrid method as MG/DGS. 

The other multigrid smoother is the incomplete LU factorization (ILU) presented 
by Wittum [35]. We use an ILU factorization of the matrix A of (10), with no fill-in 
in the factors. The ordering for A is problem dependent. For finite differences, it is 
derived from an uncoupled red- black ordering of the underlying grid. That is, the grid 
values for u\ were listed first, in red- black ordering, followed by those for u 2 , and then 
those for p. (See also Remark 3.3 below.) For P 1 (/i)P 1 (h) finite elements, A is ordered 
according to an uncoupled lexicographic ordering of the grid vectors. We denote this 
method by MG/ILU. 

In choosing preconditioners and smoothers, we have attempted to use methods 
that are suitable for vector and parallel computers. Thus, we are using point Jacobi 
smoothing for multigrid preconditioning, red-black Gauss-Seidel and line Jacobi for 
the DGS iteration, and a red-black ordering for MG/ILU applied to finite differences. 
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With the P\(h)P\{h) discretization, the operator G in the DGS method is a 19-point 
operator that has block Property A for a two-line ordering of the pressure grid, so 
that the two-line Jacobi splitting can be implemented efficiently in parallel. The ILU 
smoother used with this problem is not efficient on parallel computers. Our multigrid 
strategies do not address the issue of idleness of parallel processors for coarse grid 
computations; see [10, 13] for discussions of this point for the discrete Poisson equation. 

Parameters are required for the Uzawa, PCG and multigrid methods, and for the 
multigrid preconditioner. These are as follows: 

Uzawa: The optimal value of a for the exact Uzawa method, determined empirically, 
is used for the inexact version. This requires computation of the extreme 
eigenvalues of Qj^{BA~ 1 B T + C). 

PCG: As noted in §2.3, the preconditioner must be scaled so that 771 > 1 in (3). 
From the results of [5], it is desirable to have rji close to 1. In all tests, the 
scaling is chosen so that 1 < ift < 1.02. This requires computation of the 
smallest eigenvalue of Q^A. 3 

Multigrid: For the coarse mesh size h 0 in multigrid computations, we chose the one 
of ho — 1/2 and ho = 1/4 that produced lower iteration counts. This turned 
out to be ho = 1/2 for preconditioners and ho = 1/4 for solvers. The coarse 
grid solution is obtained using Cholesky factorization for the preconditioners 
and singular value decomposition for the solvers. 

Remark 3.1. For the Uzawa method, the choice of Qa does not guarantee that the 
condition (14) is satisfied. The results of [11, 33] as well as those of §4 suggest that 
with multigrid for Qa, (14) may be too stringent. 

Remark 3.2. The effectiveness of the multigrid solvers depends on the fact that the 
commutator W in (10) is zero away from the boundary of fi. This is true for the finite 
difference and stabilized P\(h)P\(h) discretizations, where pressures and velocities 
are defined on the same grid, but not for the (stable) P\(h)P\(2h) discretization. 
Our experiments confirm that multigrid is ineffective for this discretization, and we 
do not include it as an option. See [18, p. 248] for a discussion of this issue. For 
the P\{h)Po{2h) discretization, it is d iffi cult to define the discrete pressure Poisson 
operator A p , and we know of no multigrid implementation for this problem. 

Remark 3.3. For MG/ILU applied to the finite difference discretization, we also 
tested several alternative ordering strategies, including an uncoupled lexicographic 
ordering (i.e., like that used for Pi(h)Pi(h)), as well as several “coupled” lexicographic 
orderings. For the latter strategies, velocity and pressure unknowns are not separated 
from one another, see [28]. The performances of MG/ILU for all these orderings were 
very close. For example, for h = 1/32 as in Table 4 below, the smallest average 
iteration count with one smoothing step was 10| and the largest was ll|. 

3.3. Iteration costs. We identify the costs per iteration of each of the methods 
by first specifying the “high level” operations of which they are composed, and then 
determining the costs of each of these operations. High level operations are defined to 
be matrix- vector products, inner products (denoted “( , )” in the tables of this section), 
and axpy’s. Note that each of the techniques under consideration is formulated with 

3 In the experiments described in §4, these were computed using a power method applied to Q^ 1 A — 
I; five to ten steps were needed to obtain an estimate accurate to three significant digits. 
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Table 1 

High level operations for all solution algorithms. 



Matrix- Vector Product 

AXPY 

(, ) 

Uzawa 

1 A 1B t IQ/ 

IB 1 C 1 Q/ 

1 ( n p ) 

1 ( Tl u “h Tip ) 

PCG 

1 A 1 B t 1 Q / 

2 B 1C IQ/ 

4 (n u + n p ) 
2 ( n u ) 

3 {/flu + Tip') 

PCR 

1 A 1 B t IQ/ 

IB 1C IQ/ 

5 ( n u + n p ) 

4 (n u + n p ) 

Multigrid 

Preconditioner 

(1 + k\ + k 2 ) A 1 R u 
C h + k 2 )s / 1P U 



Multigrid Solver 

(Excluding 

smoother) 

1 A 1 B t 1 R u 

IB 1C 1R P 

IPu 1P P 


1 {tIu "1“ Tip) 

DGS Smoother 

1 A 2 B t 1 A p 

IB 1C 1 57 1 

1 S 5 1 



ILU Smoother 

1 A 2 B t 1 A p 

IB 1C IS- 1 




essentially the same set of these operations; consequently, we expect operation counts 
to give a good idea of their comparative performance. 

The high level operations are shown in Table 1. Matrix- vector products include 
operations with matrices that define the problem or method, such as A or R u , as well 
as preconditioning and smoothing operators such as Q/ and S/. The latter com- 
putations are themselves built from other matrix operations, and some of these are 
also identified in the table. All multigrid entries correspond to operations performed 
on one grid level. For multigrid solvers, the smoothing operations are presented sep- 
arately; these operations would be performed k\ times during presmoothing and k 2 
times during postsmoothing. The lengths of the vector operations are listed in paren- 
theses. We are assuming that one inner product will be used in the convergence test, 
and the counts in the table include this. 

The costs of matrix- vector products are estimated to be the number of nonzeros in 
the matrices used. This is roughly one half the number of “flops” required, and it is 
also proportional to the number of memory references. These costs, for discretizations 
in which the velocity unknowns come from an n X n grid, are shown in Table 2. The 
costs of vector operations are taken to be the length of the vectors. 

Combining the data of Tables 1 and 2 gives an estimate for the cost per iteration 
for each of the solution methods under consideration. These numbers are all propor- 
tional to n 2 , and we present in Table 3 the cost factors obtained by omitting this 
factor, rounded to the nearest integer. For the multigrid methods (preconditioners 
and solvers), the cost of one full multigrid step is estimated as 4/3 times the cost of 
the computations on the finest grid; this is approximately the cost of full recursive 
multigrid in two dimensions. 
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Table 2 

Costs for matrix-vector products. 



Fin. Diff. 

Pi(h)Po(2h) 

Pi(/i)Pi(2/i) 

Pi(/0Pi(/0 

A 

10n 2 

10 n 2 

10n 2 

10n 2 

B , B t 

4n 2 

4n 2 

8n 2 

12n 2 

C 

0 

0 

0 

5n 2 

Qm 

In 2 

0.25n 2 

0.25n 2 

In 2 

S A 1 (Jacobi) 

2 n 2 

2 n 2 

2 n 2 

2 n 2 

(Gauss-Seidel) 

6n 2 

6 n 2 

6n 2 

6 n 2 

9 _1 

3 n 2 

- 

- 

9n 2 

Ap 

5 n 2 

- 

- 

5n 2 

Ru i Ry, 

6n 2 

4.5n 2 

4.5n 2 

4.5n 2 

Rp i Rp 

3n 2 

- 

- 

2.25n 2 

S~ l 

19n 2 

- 

- 

41n 2 


Table 3 
Cost factors. 




Uzawa PCR PCG MG/DGS MG/IC 

Finite 

Differences 

ki = k 2 = l 
ki — k 2 = 2 

84 107 109 148 175 

116 139 141 244 297 

Pi(h)Po(2h) 

&i = k 2 = 1 
k 2 = k 2 = 2 

79 98 101 

111 130 133 

P 1 (h)P 1 (2h) 

k\ = k 2 = 1 
k 2 = k 2 = 2 

'IHPQ i __ r ; ' l 

P 1 (h)P 1 (h) 

h = k 2 = 1 
k 2 = k 2 = 2 

101 124 134 247 333 

133 156 166 421 591 
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Table 4 
Iterations. 




Uzawa 

PCR 

PCG 

MG/DGS 

MG/ILU 

Finite 

h 

= k 2 = l 

36 

41 

30 

24 

12 

Differences 

h 

= k 2 = 2 

28 

.33 

23 

15 

9 

P 1 (h)P 0 (2h) 

h 

= k 2 = 1 

34 

41 

29 

- 

- 



= k 2 = 2 

26 

34 

23 

- 

- 

P 1 (h)P 1 (2h) 

h 

= k 2 = \ 

89 

57 

38 

- 

- 


k 2 

CN 

II 

-S£ 

II 

89 

50 

31 

- 

- 

Pi{h)Pi(h) 

h 

= k 2 = 1 

39 

47 

32 

20 

8 


h 

— k 2 — 2 

38 

40 

25 

10 

7 


4. Experimental results. We now present the results of numerical experiments 
for solving (2). All experiments were performed in Matlab on a Sparc-10 worksta- 
tion. For each solution algorithm, we solved three problems derived from three choices 
of / consisting of uniformly distributed random numbers in [—1,1]. The initial guess 
in all cases was uo = 0, po = 0. The stopping criterion was 

\\Rih/\\R 0 h < IQ" 6 , 


where 


Ri = 


f \ _ (A BT 
0 j \B -C 



We found that performance was essentially in the asymptotic range for h = 1/32, and 
all results are for this mesh size. 

We present three types of data: iteration counts, estimates for convergence factors, 
and plots of residual norms as functions of operation counts. The iteration counts are 
averages over three runs of the number of steps needed to satisfy the stopping criterion; 
these are shown in Table 4. The estimates for asymptotic convergence factors are the 
averages of (HjRs+fl^/HjRsI^) 1 ^’ over all steps after step five; here Rk represents the 
average of the fcth residual norm over the three runs. These are shown in Table 5. We 
chose step five rather than step zero because performance was often better in the first 
few steps than later, when the asymptotic behavior is seen. Finally, Figures 2-5 plot 
the averages of the residual norms against operation counts. 

We make the following observations on these results. 

1. Where it is applicable, multigrid requires the smallest number of iterations and 
has the smallest convergence factors. MG/ILU is superior to MG/DGS in these mea- 
sures. These observations agree with those of [35]. In addition, where it is applicable, 
MG/ILU requires the smallest number of operations. However, multigrid is only ef- 
fective for discretizations where velocities and pressures are defined on the same grid. 

2. The Krylov subspace methods and MG/DGS are roughly equal in cost. The Krylov 
subspace methods are more widely applicable than multigrid. 

3. The performances of all these methods are very close. In terms of operation counts, 
the ratio of costs of the most expensive and least expensive method is no worse than 
2.3. 
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Residual norm 


Table 5 

Estimates of convergence factors. 




Uzawa PCR PCG MG/DGS MG/ILU 

Finite 

Differences 

k\ = k 2 = 1 
k\ = ^2 = 2 

.67 .70 .66 .62 .39 

.60 .64 .57 .50 .31 

Pi(h)P 0 (2h) 

fci = &2 = 1 
k 2 = k 2 = 2 

.69 .69 .70 

.58 .66 .55 

Pi(h)Pi(2h) 

k\ = k 2 = 1 
k 2 = k 2 = 2 

.82 .79 .75 

.84 .78 .70 

Pi(h)Pi(h) 

ki = k 2 = 1 
k\ = k 2 = 2 



Fig. 2. Operation counts for finite difference discretization. 




Fig. 3. Operation counts for P\{h)Po{2h) finite element discretization. 
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Residual noim . Residual none 


Fig. 4. Operation counts for Pi(h)P\(2h) finite element discretization. 




Fig. 5. Operation counts for Pi(h)Pi(h) finite element discretization. 
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4. No Krylov subspace method is clearly superior to the others. PCG exhibits a 
somewhat faster convergence rate than PCR, and the Uzawa algorithm is surprisingly 
competitive with the other two methods. This appears to derive from the dependence 
of PCG and PCR on both the spectral condition number k from (13) and the accuracy 
of the preconditioner Qa as an approximation to A; for both these methods, the 
iteration counts go down in all cases when the number of smoot hin g steps in Qa 
increases. The Uzawa method appears to be less sensitive to the accuracy of Qa- The 
values of k for the three problems are: 

Finite differences 4.14 Pi(h)Pi(2h) 22.71 

P l {h)P 0 (2h) 4.87 P 1 (h)P 1 (/i) 9.91 

The Uzawa method is least effective for the Pi(h)Pi(2h) discretization, which has the 
largest condition number. 

5. The Uzawa and PCG methods depend on choices of iteration parameters. These can 
be estimated relatively inexpensively (e.g., using a coarse grid for the Uzawa method, 
and a few steps of the power method for PCG), but this increases the cost of these 
methods and makes implementing them considerably more difficult. In contrast, PCR 
is independent of parameters except for those needed for the multigrid precondition- 
ing, and it is therefore easier to implement. Thus, there is a tradeoff between these 
methodologies: PCR converges slightly more slowly than PCG and, often, than the 
Uzawa method, but it has a simpler implementation. 

6. For each of the solution strategies except PCG, it is less expensive to use one 
smoothing step than two. 

Acknowledgements. The author wishes to thank David Silvester for a careful 
reading of a preliminary version of this paper, and Andy Wathen for some helpful 
remarks. 
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Abstract 

We present an optimal preconditioning algorithm that is equally applicable to the 
dual (FETI) and primal (Balancing) Schur complement domain decomposition methods, 
and which successfully addresses the problems of subdomain heterogeneities including 
the effects of large jumps of coefficients. The proposed preconditioner is derived from 
energy principles and embeds a new coarsening operator that propagates the error glob- 
ally and accelerates convergence. The resulting iterative solver is illustrated with the 
solution of highly heterogeneous elasticity problems. 

1. Introduction 

With the advent of parallel processing, domain decomposition (DD) based 
iterative algorithms have become increasingly popular for the solution of finite ele- 
ment systems of equations. Indeed, domain decomposition provides a higher level 
of concurrency than global algebraic approaches, and is simpler to implement on 
most parallel computational platforms [ref. 1]. In general, the subdomain equa- 
tions are solved using a direct skyline or sparse factorization based algorithm, 
and the interface problem is solved iteratively — usually, by a preconditioned con- 
jugate gradient (PCG) algorithm (for symmetric problems). The success of such 
an iterative algorithm hinges on two important properties: numerical scalabil- 
ity, and parallel scalability. A subdomain based iterative method is said to be 
numerically scalable if the condition number of its corresponding interface prob- 
lem does not grow or grows “weakly” with the mesh size h and the subdomain 
size H. For example, if the interface problem has a condition number k that 
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grows asymptotically as 


k = 0(1 + log 2 )) (1) 

then, the underlying subdomain based iterative method is numerically scalable. 
The practical implications of a condition number such as that described in Eq. (1) 
are twofold: 

® Suppose that a given mesh is fixed: one processor is assigned to every 
subdomain, and the number of subdomains is increased in order to increase 
parallelism. In that case, h is fixed and H is decreased. From Eq. (1), it 
follows that the condition number of the interface problem decreases. This 
implies that the number of iterations for convergence can be expected to 
decrease with an increasing number of subdomains. 

• On most distributed memory parallel processors, the total amount of avail- 
able memory increases with the number of processors. When solving a certain 
class of problems on such parallel hardware, it is customary to define in each 
processor a constant subproblem size, and to increase the total problem size 
with the number of processors. In such a case, h and H are decreased, but the 
ratio H/h is kept constant. In theory, it follows from Eq. (1) that a numeri- 
cally scalable DD algorithm can solve larger problems with the same number 
of iterations that are required for smaller problems simply by increasing the 
number of subdomains. 

In practice, numerical scalability is most interesting when parallel scalability 
can also be achieved. The latter property characterizes the ability of an imple- 
mented algorithm to deliver a larger speedup for a larger number of processors. 
Therefore, a subdomain based iterative method that boasts both numerical and 
parallel scalability is clearly an “ultimate” solution algorithm. Unfortunately, 
numerical scalability can be achieved only if, at each CG iteration, the DD algo- 
rithm can propagate the error globally to accelerate convergence. Since a global 
propagation usually induces long range communication, it follows that numerical 
scalability and parallel scalability are often two conflicting objectives. Domain 
decomposition theory suggests that a good approach for tackling this issue is to 
augment the DD algorithm with a coarse “grid” problem [ref. 2-4] that is large 
enough to disseminate significant information globally and yet is small enough to 
keep computations and communication affordable. Moreover, specialized iterative 
algorithms are now available for solving efficiently these coarse grid problems on 
massively parallel processors [ref. 5,6]. Therefore, an ultimate DD based iterative 
solver is conceivable. 
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The dual Schur complement method, also known as the Finite Element Tear- 
ing and Interconnecting (FETI) method [ref. 7-10], is among the first DD meth- 
ods to have demonstrated numerical and parallel scalability for the solution of 
self-adjoint elliptic partial differential equations (PDE) discretized with unstruc- 
tured finite elements. This method has also been shown to outperform several 
popular direct and iterative algorithms on both sequential and parallel computing 
platforms [ref. 1,10]. Essentially, the FETI algorithm can be viewed as a two-step 
CG-based iterative procedure where subdomain problems with Dirichlet bound- 
ary conditions are solved in the preconditioning step, and related subdomain 
problems with Neumann boundary conditions are solved in the second step. We 
refer to the FETI method as the dual Schur complement method because on the 
outset it constructs the dual Schur complement operator. For time-independent 
elasticity problems, the condition number of the unpreconditioned FETI interface 
problem grows asymptotically as [ref. 1,11] 

« = O(f) (2) 

When preconditioned with a subdomain based Dirichlet operator, the condition 
number of the FETI interface problem varies as [ref. 1,11,12] 

K = 0(1 + log 0 ( j )), 0 < 3 (3) 

The conditioning results (2) and (3) highlight the numerical scalability of the 
FETI method with respect to both the mesh size h and the number of subdomains 
(which is related to 1/lf). The parallel scalability of this DD method — that is, 
its ability to achieve larger speedups for larger number of processors — has also 
been demonstrated on current massively parallel processors for several realistic 
structural problems [ref. 1,5]. 

The numerical scalability of the FETI method is due to a coarse problem 
naturally present in the formulation of the interface problem. In order to guar- 
antee the solvability of the local Neumann problems associated with floating 
subdomains — that is, subdomains without enough essential boundary conditions 
to prevent the local stiffness matrices from being singular — a small auxiliary 

global problem with at most 6 unknowns per subdomain is solved at each PCG 
iteration. In [ref. 11], it was shown that this auxiliary problem indeed plays the 
role of a coarse problem; it provides a satisfactory mechanism for global propaga- 
tion of the error, which accelerates convergence so that the number of iterations 
is practically independent of the number of subdomains. 
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Another numerically scalable algorithm for elasticity problems is the Bal- 
ancing DD method [ref. 13]. This method is essentially an important improve- 
ment of the well-known N eumann- N eumann DD algorithm [ref. 14]. The original 
Nft nmann -Ne iiTna,nn DD method can be summarized as a two-step CG-based iter- 
ative procedure where subdomain problems with Neumann boundary conditions 
are solved in the preconditioning step and subdomain problems with Dirichlet 
boundary conditions are solved in the second step. We refer to this method as 
the primal Schur complement method because on the outset, it constructs the 
primal Schur complement operator. The original Neumann-Neumann method 
lacks a coarse grid problem for propagating the error globally and accelerating 
convergence. In practice, its rate of convergence deteriorates significantly when 
more than 8 subdomains are introduced [ref. 15]. As in the FETI method, the 
coarse problem of the Balancing DD algorithm is defined in terms of the null 
spaces of the local stiffness matrices. This coarse problem restores the scalability 
of the original Neuma nn -Neumann method for a large number of subdomains. 

However, it should be noted that the theoretical scalability and optimal 
conditioning properties of the FETI and Balancing DD methods hold in prac- 
tice when the subdomains have good and/or comparable aspect ratios, and the 
partial differential equation to be solved does not feature large (subdomain) co- 
efficient jumps [ref. 1,11]. Each of these two issues represents a different type 
of subdomain heterogeneity that must be dealt with. In [ref. 16], the authors 
have proposed a remedy to the first problem in the form of a mesh partition- 
ing optimizer that delivers subdomains with good aspect ratios. In [ref. 17], an 
ad-hoc scaling procedure was discussed in the context of the Neumann-Neumann 
DD method for handling potential subdomain heterogeneities. In this paper, we 
present a rational and superior approach for tackling simultaneously and indif- 
ferently all kinds of subdomain heterogeneities. Our methodology is based on 
energy principles and is best described as a smoothing scheme. However, we also 
formulate it as a preconditioner. For problems with more than two subdomains, 
this smoothing scheme generates a coarse grid subproblem that propagates the 
error globally and accelerates convergence. Because of space limitations, we con- 
sider only the case of the FETI or dual Schur complement method. However, the 
experienced reader will be able to easily transpose the described methodology 
to the case of the Balancing or primal Schur complement method. We report 
some preliminary numerical results that demonstrate superior convergence rates 
for highly heterogeneous elasticity problems. 
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2. The FETI or dual Schur complement method 
The problem to be solved is 


Ax — b (4) 

where A is an n x n symmetric positive semi-definite sparse matrix arising from 
the finite element discretization of an elasticity problem defined over a region ft, 
and b is a right hand side n-long vector representing some prescribed forces. If ft 
is partitioned into a set of N s disconnected subdomains ft( s \ the FETI method 
consists in replacing Eq. (4) by the equivalent system of subdomain equations 

A (s) x (s) = a = l, N s 

S=N 

B (s) x< s ) = 0 

5 = 1 

where A^^ and b^ are the restriction of A and b to the disconnected sub- 
domain ft( a ), A is a vector of Lagrange multipliers representing the normal 
derivatives of the primal variable of the problem on the subdomain interface 
boundary T*j 8 \ and is a signed Boolean matrix which describes the in- 
terconnectivity of the subdomains. From a physical viewpoint, the first of 
Eqs. (5) represents the subdomain equations of equilibrium with Neumann 
boundary conditions, A represents the “gluing” forces between the disconnected 
subdomains (Figure 1), and the second of Eqs. (5) represents the compat- 
ibility of the subdomain solutions x ^ across the subdomain interfaces Tj = 

s=Ng / v 

1J Tj. A more elaborate derivation of Eqs. (5) can be found in [ref. 1,7-11]. 

5=1 

Mesh Tearing Mesh Reassembly 


Figure 1. Schematic description of the FETI method. 




( 5 ) 


305 


In general, the mesh partition will contain some Nf floating subdomains, 
and therefore the Neumann problems 

A (s) x (s) = 6 (s) -B (a)T A a = 1, ..., N f (6) 

will be singular. To guarantee the solvability of these problems, we require that 

(&(') - B^ T A) _L Ker ( A (s) ) (7) 

and compute the solution of Eq. (6) as 

*(') = A (s)+ (6 (s) -B^ s)T X) + R^ s) ^ s) (8) 

where is a generalized inverse of that need not be explicitly computed 

(see, for example, [ref. 9]), R ^ = Ker (A^) is the null space of A^ 8 \ and 
is a vector of six or fewer constants (there are, at most, six rigid body modes in 
a three-dimensional elasticity problem). The introduction of the few additional 
unknowns is compensated by the additional equations resulting from (7): 

jj(-) t (&(*)_B( s > T A) = 0 s = l, ..., N a (9) 

Substituting Eq. (8) into the second of Eqs. (5) and using Eq. (9) leads (after 
some algebraic manipulations) to the following FETI interface problem: 


where 






( 10 ) 


s=N 

Ft = B (a) A (a)+ B {3)T -, G r = ... 

3=1 

a = [ a W T ... a<*/> T ] T ; d = 5 (s) A (s)+ 6 (a) ; e (s) = 6 (a)T R (s) 

5=1 

= A^ 1 if is not a floating subdomain 

= a generalized inverse of A^ if is a floating subdomain 

Clearly, Fj is the sum of independent subdomain operators. Under certain condi- 
tions, it can be shown that Fj. is the sum of the inverses of the subdomain Schur 
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complements [ref. 1,11], which justifies the labeling of the FETI method as the 
dual Schur complement method. It possesses some interesting spectral properties 
that trigger a superconvergent behavior of a CG algorithm applied to the solu- 
tion of (10) [ref. 1,11]. Because the above interface problem (10) is indefinite , 
the second step of the FETI method consists in solving it via a preco nditio ned 
conjugate projected gradient (PCPG) algorithm with a preconditioner Ff 1 and 
the projector 

P = I-G, (G i t G i )~ 1 G i t (11) 

More specifically, the PCPG FETI algorithm can be formulated as follows 
[ref. 1]: 

1. Initialize 


A 0 = Gi (G/Gj)"^ 
r° = d-Fj A 0 


2. Iterate k 

= 1, 2, 

i ... 

until convergence 


Proj ect 

it ;* -1 

= 

pT r k- 1 



Precondition 

z k ~ l 

= 

FfV" 1 



Proj ect 

y k ~ l 

= 

P z k ~ l 




c* 

= 

y k -l T W k -l/y k ~ 

' 2T w k ~ 

1 

to 

</*% 

II 

o 


p‘ 

= 

y k -' + C h P k ~ 1 

{P l = 

»°) 


u k 

= 

k—l T k— 1 / k T 

y k w k 7 p k 

F lP k 



A* 

= 

A *-i + v k p k 




r* 

= 

r *-i _ v * Flp k 




( 12 ) 

The reader can easily check that, because of the presence of the second projec- 
tion step, the iterates are independent of the particular choice of the generalized 
inverse in Eq. (8). 

The application of the projection operator P in (12) means that a coarse 
problem of the form (G/ T G/)y = c (size < 6 x Nf < 6 x N s ) must be solved 
(twice) in each FETI iteration. It was shown in [ref. 11] that this coarse problem 
has the expected beneficial effect of coupling all subdomain computations and 
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propagating the error globally, so that the condition number of the interface 
problem can be bounded as a function of H/h but is independent of the number 
of subdomains. 

Two preconditioners have been previously developed for the FETI method: 
(a) a numerically optimal Dirichlet preconditioner that can be written as 



--N. 


Bis) 


3=1 


0 

0 


|W 

l bb 


0 

A*) T 

A ib 


"u A ib 


s=N, 


B (*) t - ^2 B (s) 


9=1 


0 

?(<) 


b(*) t 


'bb J 
(13) 


where denotes the primal Schur complement of subdomain and the sub- 
scripts i and b designate internal and interface boundary unknowns, respectively; 
and (b) a numerically efficient “lumped” preconditioner that lumps the Dirichlet 
operator on the subdomain interface unknowns 



s=N a 

E B<,) 

3=1 


0 

0 



b(*) t 


(14) 


— 1 jr *** 1 

Unlike Fj , the preconditioner Fj is not mathematically optimal. However, 

£) ~ 1 

it is more economical than Fj and has often proved to be more efficient [ref. 

1 , 11 ]. 

For practical elasticity problems, the FETI method with either the Dirichlet 
or Lumped preconditioner is numerically and parallel-wise scalable, when the 
subdomains have good aspect ratios and no large coefficient jumps. The objective 
of this paper is to present a third preconditioner that generalizes the two described 
above and successfully addresses all kinds of heterogeneity problems. 

3. Preconditioning with an energy based smoothing procedure 

3.1. The two-subdomain problem 

In order not to obscure the main idea of this paper by the complexity of 
the notation needed for a problem with an arbitrary number of subdomains, we 
consider first the case of a problem with two heterogeneous subdomains. The 
general case of a system including multiple (N s > 2) and arbitrarily connected 
subdomains is treated in Section 3.2. 

At each iteration of the PCPG FETI algorithm, the matrix vector product 
Fjp k produces a jump in the iterate x k across the subdomain interfaces. (In the 
sequel, we drop the superscript k for simplicity.) For a heterogeneous problem — 
for example, a problem with different subdomain stiffnesses — this jump is bound 
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to be rather large. Elementary mechanics theory suggests that the solution 
in the stiffer subdomain ft* 8 ) will be closer to the desired converged solution 
than the solution in the more flexible subdomain. This in turn suggests that the 
computed solution x should be smoothed after each PCPG iteration as follows: 



Once again, the subscripts i and b designate the internal and interface boundary 
unknowns. Equations (15) state that first a smoothing of the solution is imposed 
on the interface boundary between the two subdomains, then a local Dirichlet 
problem is solved in each subdomain to propagate the beneficial effect of this 
smoothing to the internal unknowns. Of course, the important question is how 
to select the optimal smoothing parameter a. 

Let Si denote the displacement jump on T / defined as 

= 4 2) - zS 1 * ( 16 ) 

From Eqs. (15) and (16), it follows that and x^ can be rewritten as 

s< b ] = *j 1) + Ai 1 1) 4 2) = 4 2) + a4 2) ( 17 ) 

where 

Axj 1 * = aSi = —(1 — a)8j (18) 

First, we note that Eqs. (5) can be rewritten as 


[ 4 1 4 > ooo] r*('>] r/;‘>- 

4 >r 4 o o s (1)T 4> 4 (1) 

0 0 Af 0 X? = /P> (19) 

0 0 A ?/ 4 1 b<2,t 4 2) A <2) 

0 0 B< 2 > 0 J L A J 0 . 



[ ^ 1 [0] 

b^ + bP + n (20) 

fe S 2) J L 0 - 
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where r& is the interface residual induced by smoothing. From Eq. (19), it follows 
that 

r„ = S< l 1) A*< 1 > + Sg>A*< J) (21) 

where is the Schur-complement with respect to the interface boundary un- 
knowns of the stiffness matrix of subdomain 

sit = A't - 4f Af' Ai 1 ( 22 ) 

Rewriting the induced interface residual in terms of the solution jump Si as 

n = n(a) = (aS<J> + (a - lJSg’ji, (23) 

leads to the conclusion that the optimal parameter a of the smoothing procedure 
(15) is that which minimizes r^. However, rather than minimizing directly some 
norm of r&, we propose to adopt a Rayleigh- Ritz approach where the smoothed 
solutions x^(a) and x^ 2 \a) given in Eqs. (15) are viewed as kinematically ad- 
missible fields parameterized by a, and to minimize the corresponding energy 
of the global system. For the two-subdomain problem discussed here, the total 
energy can be written as 


S(a) 


- [x ( . 1)5 
2 L * 


[x 


i 


Z.T 

X I 


-,T s ( 2 ) T ] 



r a (1) 

A (1) 

A ib 

0 


a {1)T 

A %b 

A {1) + A {2) 

A bb + A bb 

4 (2) 

A ib 


0 

A (2) T 

A ib 

A (2) 
A ii J 

r 

f (l) 

1 



x i x . 






x i 


X 


XI 

( 2 ) 


which in view of Eqs. (15)-(23) simplifies to 

£(a) = C-iaSfSil^ + aHKS^ + SH^S, 


(24) 


(25) 


where C is an expression that does not depend on a. Differentiating £ with 
respect to a, recalling Eq. (16), and enforcing the condition 


^ = -2 SfSllhj + 2aST(sil> + Sg% = 0 


(26) 
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finally gives 


(27) 


To the authors of this paper, the importance of the above selection of the pa- 
rameter a is best recognized from a physical viewpoint. Indeed, the smoothing 
procedure described by Eqs. (15) and (27) consists in treating the two sub- 
domains as two linear springs connected in series, computing the jump of the 
displacement field at their connection, and redistributing this jump among both 
springs according to their “relative stiffnesses” and k^ 2 \ While the idea of 
estimating a local measure of the stiffness of a subdomain to build a scaling ma- 
trix for the subdomain preconditioner is not new [ref. 1,17], the derivation of 
the smoother presented in this paper sheds new light on the precise treatment 
of all kinds of stiffness heterogeneities. More importantly, Eqs. (27) give for the 
first time rational estimates and k^ of the local measures of the subdomain 
stiffnesses that clearly contain, among others, the effect of material properties 
(PDE coefficients), mesh resolutions, and aspect ratios. From a mathematical 
view point, these constants can be described as the Schur-complement norms of 
the jump of the solution at the subdomain interfaces. Note that if the two sub- 
domains and their finite element models are identical, Eqs. (27) give k^ = k^ 
and a D = 1/2. If the two subdomains differ only in a constant, for example, 
Young’ s modulus E, then Eqs. (27) give a D = /(E^ + E^). This clearly 

shows that the smoothing procedure proposed in this paper includes the scaling 
schemes proposed in [ref. 1,17] as a particular case. However, if the subdomains 
do not differ only in one constant, the scaling procedure a D = E^/(E^ + E^) 
is not applicable, but the smoothing scheme proposed here is. Moreover, for prob- 
lems with more than two subdomains, we will show in Section 3.2 that, unlike 
the scaling procedure discussed in [ref. 1,17], the smoothing algorithm presented 
in this paper generates a coarse grid problem that accelerates convergence. 

The superscript D in Eqs. (27) is used to highlight the fact that computing 
the smoothing parameter a D requires solving subdomain Dirichlet problems that 
axe similar to those induced by the optimal Dirichlet preconditioner (13). Clearly, 

this establishes that the smoothing procedure (15,27) can be viewed as an im- 

—D~ x 

proved optimal Dirichlet preconditioner Fj . Alternatively, we can construct 


k (D D + kW° 

k {1)D = sjsl^sj = (4 2) -*S 1) ) t £w ) ( x 6 2) -4 1} ) 
fc (2)D = 


311 




a more economical variant of the proposed smoother where the effect of the in- 
terface smoothing is not back-propagated to the subdomain internal unknowns. 
Following the derivation presented above, the reader can easily check that such a 
strategy leads to a smoothing procedure similar to that given by Eqs. (15) but 
where the Schur-complement matrices S $ are replaced by the “lumped” interface 
stiffness matrices A ^ , and to the following “lumped” averaging parameter 



Of course, smoothing with the above lumped strategy can also be viewed as 

=— L -1 

preconditioning with an improved lumped preconditioner Fj . The compu- 

tational advantages of Fj are obvious since are readily available, and 
sparse matrix- vector multiplications rather than forward-backward substitutions 
are needed to evaluate the smoothing parameter a L . 

3.2. The multiple subdomain problem and the new coarsening operator 

Here, we generalize the smoothing procedure presented in the previous sec- 
tion to the case of multiple ( N 3 > 2) and arbitrarily connected subdomains. 

Let b ^ denote the restriction of the Boolean operator defined in Eqs. (5) 

to the interface boundary of a given subdomain Using the inter- 

nal/interface subdomain partitioning of the unknowns we have 

B (s) = [0 6 (s) ] (29) 

The interface boundary of each subdomain can be broken into edges and therefore 
b ^ can be partitioned as 

fe (s) _ ... &(•),*] (30) 

where is the restriction of b^ to the j-th edge of Note that Eq. 

(30) implies that every interface point is assigned to one and only one edge, and 
therefore a crosspoint is treated as a single point edge. Finally, we introduce the 
unsigned equivalents of B^ s \ b^ s \ and and designate them with a circumflex. 
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Using this notation, the jump of the solution x across an edge j between two 
subdomains and can be written as 


= l(s)j x (s)J _ (31) 

where x[ s ^ j is the trace of the subdomain solution x (s) on the the edge j. Conse- 
quently, the generalization to an arbitrary number of subdomains of the smooth- 
ing procedure proposed in Eqs. (15) is given by 


x 


(«).i _ 


= E 




■■'(0 ~ 


-(•) 

x) 


1^3 

**<■> + *!•> 


(32) 


It remains to find the optimal values of the edge coefficients /3 (s),J . For this 
purpose, we follow conceptually the same Rayleigh-Ritz approach presented in 
Section 3.1. Let Ax^ )j be defined as 


Ax 


(s).i 


x b 


— x 


(«) J 


(33) 


If the coefficients are constrained to have a unit sum 

Y /?(*).; = i 

then from Eqs. (31)-(33) it follows that 


(34) 


Aa >)>; = _6 (»).j t y P (t),j ( 35 ) 

For a problem with an arbitrary number of subdomains, the total energy can be 
written as 

£(pW’j) = Y fi(s)T a(s) ^ s) - fi(s)T f (s) (36) 

S=1 

If the effect of interface smoothing is back-propagated to the subdomain interiors 
(Dirichlet smoothing), £(/3 (s),J ) can be rewritten as 


s=N. 


£ Q3(»>.i) = 


s=l 


3 = N s 


(37) 


C + 5 S Az<‘> S«Az<” 


3=1 
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where C is a function that does not depend on the edge parameters On 

the other hand, if the effect of interface smoothing is not back-propagated to the 
subdomain interiors (lumped smoothing), will have an expression similar 

to that of (37) but with replacing every occurrence of Minimizing the 
energy with respect to the edge smoothing parameters = 0) leads after 

some algebraic transformations to 


E E E s < * )jT $,•’]*» = o y», r<»> 9 j 

| r (' ) 3 j r). p) 9Jt 

S^iq p^s 

PT 


where is the Schur-complement of associated with the edge j and k. 

Hence, the edge smoothing parameters are given by the solution of a coarse 
auxiliary problem of size as small as the number of edges in the mesh partition. 
There is no question that the above system of equations (38) is quite complicated 
to read. However, there is also no question that it is easy to program since the 
£(s),* are Boolean operators. 


3.3. Dealing with the variable preconditioner 

Since the values of the jumps of the iterate x k across the subdomain edges 
change in each iteration k , it follows from Eqs. (27) and (28) that the proposed 
preconditioner changes in every FETI PCPG iteration. Of course, one can al- 
ways freeze the coefficients /?0)>i after the first or a few iterations. However, a 
reorthogonalization is always used in practice with the FETI method [ref. 1], so 
that the variation of the preconditioner with the iteration number is not an issue. 
We note that we have previously demonstrated (see [ref. 1], for example) that 
this reorthogonalization is cost-efficient because it is applied only to the interface 
problem, and it does not significantly increase the total CPU time for solving the 
global problem. 

4. Numerical results 

In order to demonstrate the potential of the proposed preconditioner, we con- 
sider the plane stress analysis of a two-dimensional heterogeneous structure com- 
prising steel and rubber subcomponents (see Figure 2). The. global structure is 
clamped at one end and is subjected to a horizontal and vertical point loads at the 
top of the other end. The nearly incompressible rubber subcomponents are char- 
acterized by a Young modulus £;( rubber ) = 5.0 x 10 7 N/mm 2 and a Poisson ratio 
^(rubber) _ g.48, and the steel subcomponents by E steel = 2.05 x 10 11 N/mm 2 
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and u steel = 0.3. The numerical difficulties of this problem are spurred by its 
high degree of heterogeneity, measured here by the ratio E steel / E rubber = 4098, 
and by the presence of a crosspoint between extremely stiff and extremely flexible 
subdomains. 


Figure 2. A heterogeneous steel/rubber plane elasticity problem. 

Four different meshes axe constructed for the solution of this problem using 4, 16, 
64, and 256 subdomains. All meshes verify H/h = 8. The FETI method is used 
with: (a) the Dirichlet preconditioner weighted by the number of subdomains 
connected to an interface point (DR) and (b) the smoothing based new Dirichlet 
preconditioner summarized in Eqs. (32) and (38) (SMTH). The convergence re- 
sults are reported in Table I where N eq and Nu r denote, respectively, the number 
of equations associated with each finite element discretization and the number of 
iterations. All computations axe performed using MATLAB. 

Table I. Solution of a Steel/Rubber Heterogeneous Plane Elasticity Problem 

FETI solver 

Dirichlet precond. (DR) vs. new smoothing based Dirichlet precond. (SMTH) 
Global convergence criterion: || Ax — 6| 1 2 < 10 -6 x j |6 | [2 


H 

h 

N eq 

N s 

Nitr 

(DR) 

Nitr 

(SMTH) 

1/2 

1/16 

612 

4 

35 

11 

1/4 

1/32 

2520 

16 

105 

39 

1/8 

1/64 

10224 

64 

153 

80 

1/16 

1/128 

41478 

256 

246 

82 



The FETI method is shown to converge three times faster with the new smoothing 
based Dirichlet preconditioner than with the original Dirichlet preconditioner. 
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SUMMARY 


We describe a high performance parallel multigrid algorithm for a rather general class of unstruc- 
tured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured 
multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan 
and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse 
grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse 
developed by Baumgardner and Frederickson. 


INTRODUCTION 


The fundamental task of the algorithm PUMG is to solve a large sparse linear system of the form 

Au = v (1) 

as efficiently as possible, since it will likely need to solve it repeatedly. We assume that a tolerance 
e has been given, and that an approximate solution u is acceptable if the residual 

r = v — Au (2) 

satisfies || r || < e. For clarity we refer to u as an e — approximate solution when this is the case. In 
many cases the sparse matrix A will be symmetric and positive definite, which makes the theoretical 
analysis easier, but we observe excellent convergence for rather general nonsymmetric systems as well. 
We assume that eqn.(l) is the discretization, by some linear process, of the continuous linear system 

Au = v (3) 

on a smooth two- or three-dimensional manifold fl. One of the advantages of the algorithm PUMG 
is that this may as well be an unstructured discretization. Internally, PUMG uses a cell-based 
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discretization algorithm to construct the coarse grid approximations, even though the given sparse 
linear system (1) may have been the result of a quite different discretization process. 

The continuous linear system (2) may well be the variational equation of a nonlinear system which 
is to be solved by Newton’s method, or it may represent an implicit time step in the evolution of 
a hyperbolic system 


.^(f, u(a:,f)) = v(x,t). (4) 

PUMG was developed for the implicit time step of a hyperbolic system of this form, namely the 
shallow water equations on a sphere, which explains our interest in efficiency. 

Higher order interpolation is the first key to higher performance in the unstructured multigrid 
algorithm PUMG. Since the concept of polynomial reconstruction on which we base our interpolation 
is not yet widely known, we devote the third section to a clarification of this idea and a description 
of how it is used to construct the interpolation operator Q used in PUMG. 

The second key to higher performance in PUMG is the use of more than one coarse grid at 
every level, in a manner somewhat analogous to that in the algorithm PSMG [13][14]. We make this 
concept of tree structured multigrid more precise in the fourth section. The third key is the use of a 
well tailored local approximate inverse in the smoothing step of PUMG. In the fifth section we discuss 
the quadrature based smoother QBS introduced by Baumgardner and Frederickson at the 1993 Copper 
Mountain Conference and contrast it with the ILU, LS and DB smoothers. 


UNSTRUCTURED CELLULAR DECOMPOSITIONS 


There is no longer any doubt about the advantages of unstructured grids in the high precision 
solution of many real world problems. Their flexibility allows the gridding of complicated domain 
shapes more readily, and allows local mesh refinement in regions where the solution develops high 
gradients. The early work of Bank and Sherman [3-5], Bank and Rose [2], and others gives ample 
evidence of this, along with the fact that multigrid can be adapted to the solution of these problems. 
Convergence rates for unstructured multigrid algorithms remain somewhat slower than they are for 
classic multigrid however, which is one of the motivations for the current algorithm. We observe that 
general cellular decompositions of a domain offer computational advantages over decompositions that 
use only tetrahedra and hexahedra. For example, several times as many tetrahedra of a given maximal 
diameter are required to fill a region as are required for well-proportioned cells. This increases the 
cost of most aspects of the computation. We claim that the concept of cell center is not important 
in cellular discretizations, as it is better to think of a quantity as distributed over the cell rather than 
located at any one point. For higher order accuracy this distribution will, of course, be nonuniform. 
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HIGHER ORDER POLYNOMIAL RECONSTRUCTION 


The first key to high performance in a multilevel solver for unstructured grids is a high-order 
interpolation operator Q for transferring a subgrid solution to the next higher level. From the cellular 
discretization viewpoint, this implies a model for the distribution within each coarse cell of the 
variable to be interpolated to the finer cells. 

We will construct this distribution using a polynomial reconstruction algorithm R that constructs 
a polynomial pi in each cell C l using the state Uj in neighboring cells Cj. Exactly how we choose 
a neighborhood will depend on several factors, including the desired degree k of the reconstruction. 

fk\ 

We will require at least ( ^ J cells, including cell Ci itself, to reconstruct a polynomial of degree 

k in d dimensions. When the cellular decomposition is fairly uniform we usually find that the cells 
contiguous to a given cell, together with that cell, form a sufficiently large neighborhood to support 
quadratic reconstruction. Boundary cells will need to use a more one sided neighborhood if they 
require the same degree of reconstruction. The effect of this is not so severe if there is a layer or two 
of smaller cells near the boundary, with further refinement in the comers. This boundary refinement 
is often advantageous for a variety of other reasons, one of the advantages of an unstructured grid. 

To make the concept of neighborhood precise we denote by N{ the set of indices j of the cells Cj 
in the chosen neighborhood of cell Ci. For simplicity we will assume that the system we are solving 
is scalar, and we will represent it with a state vector u = {u t ). The vector of polynomials that results 
from the reconstruction will be denoted pi = (pi) in the following discussion. In each case the index 
i runs over the list of cells in the unstructured grid. 

We will define an operator R that constructs a polynomial of degree k in each cell to be a k-exact 
polynomial reconstruction operator if it satisfies the following three axioms: 

Axiom 1: The operator R preserves cell averages. If we denote by S the discretization operator that 
computes the average of a variable over each cell, then R satisfies 

p = Ru =>• u = Sp. 

Axiom 2: The operator R is k-exact in that it reproduces polynomials of degree k exactly: 

p E V k , u — Sp => p = Ru. 

Axiom 3: The operator R is local in that it constructs the polynomial in cell C t using the values of 
Uj in neighboring cells Cj only: 

uj = 0, j E Mi ==>■ ( Ru) i — 0. 

Note that Axiom 1 simply states that I? is a right inverse of the averaging operator S: 

SR = I (5) 

and Axiom 2 states that R is a left inverse to S restricted to the space of degree k polynomials: 

RSp = p, p E V k . (6) 
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From these two equations it should not be surprising that the reconstruction operator R is a pseudo- 
inverse of some sort. In fact, the construction below builds R as a sparse block matrix, each row of 
which constitutes a Moore-Penrose pseudo-inverse of S. 

T his concept of k-exact reconstruction on unstructured grids was introduced by Frederickson and 
Barth [12] for use in a high-order CFD solver on unstructured grids, and has been further developed 
by Barth [6], Coirier and Powell [9], and others for a variety of applications in fluid dynamics. 
T his appears to be the first application or k-exact reconstruction to an unstructured and non-nested 
multigrid solver. 


Numerical Construction of R 


We prescribe a unique operator R satisfying these three axioms by choosing the one with the 
smallest coefficients, in the least squares sense. This description of R as the solution of a variational 
problem allows it to be constructed with the linear least squares procedure that we describe below. 
Better yet, this construction is fairly inexpensive, as each row of R, when represented as a block 
matrix, is constructed independently. The computation proceeds as follows: 


Step 1: 
Step 2: 


For each j, choose a list Nj of nj cells that neighbor the cell Cj, (including the cell Cj 
itself), enough so that nj > ■ 

For each j, choose a local basis < pi,P 2 , - ■ ■ iPm > for V k on the cell Cj, compute 

basis functions on cell Cj, and enter these as the j th 


the averages of these m 
column of the matrix W: 


Wij = (S Pi )j. 

Step 3: For each j, form the m by nj matrix V by deleting all columns of W not in Nj and 
translating the remaining columns to the local coordinate system of cell Cj. Beginning 
with a QR-factorization of the matrix V s , compute the Moore-Penrose pseudo-inverse 
of V and enter this as the j th row of the sparse block matrix representation of the 
reconstruction operator R. During the QR-factorization make sure that the matrix is of full 
rank, otherwise the list of neighbors must be increased. 

In practice we avoid forming the matrix V H explicitly for input to the QR algorithm, and instead 
apply the transpose of the QR-factorization algorithm to V itself. This modified algorithm factors V 
into the product of a lower triangular matrix L and an orthogonal matrix Q, and therefore is sometimes 
referred to as the LQ-factorization of a matrix. We recommend the use of either Householder rotations 
or Givens reflections in carrying out the factorization. Finally, we wish to warn the reader of a 
notational conflict: the Q and R of QR-factorization have nothing to do with the reconstruction 
operator R or the interpolation operator Q that we discuss next. 


The Operator Q 

The interpolation operator Q that transfers the state u from a coarse grid to a finer grid is constructed 
from the reconstruction operator R in such a way that each application of the operator Q is equivalent 
to the following three steps: 
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Step 1: Reconstruct pi = (i?u) t - on each cell Ci of the coarse grid. 

Step 2% Intersect each cell Ci of the coarse grid with every cell of the finer grid, and transfer 
Pi to that intersection. 

Step 3: On each cell C t ' of the fine grid apply the averaging operator S to the resulting piecewise 
polynomial function. 

For the sake of computational efficiency, however, this use of the reconstruction operator R in 
constructing Q is carried out only during the setup phase, and results in an explicit sparse matrix 
representation of the interpolation operator Q. The most difficult step in this construction is forming 
the grid which is the intersection of the coarse grid and the fine grid because our grids are not generally 
nested. In saying this we assume that the averaging operator S, which is able to compute the average 
over an arbitrary cell of a polynomial p £ V k , is already available. 


TREE STRUCTURED PARALLEL MULTIGRID 


The second key to rapid convergence is the use of more than one coarse grid at every level, 
resulting in a tree structured algorithm. The convergence is sufficiently faster for difficult problems 
to justify the somewhat greater computational complexity, which is not excessive if the subgrids are 
coarse enough. The code complexity is not significantly greater, as the code fragment shown in Figure 
1 demonstrates. The variable node in this routine points to a data structure that contains everything 
that would be needed at one level of an ordinary multigrid algorithm such as FAPIN; in addition, 
this data structure includes pointers to the nodes that contain the next finer grid and pointers to the 
nodes that contain all coarser grids. 


LOCAL APPROXIMATE INVERSES AS SMOOTHERS 


The third key to the strong convergence of PUMG is the use of a well engineered local approximate 
inverse Z to remove the high frequency part of the error via the two step smoother 

r <— v — Au (7) 

u <— u + Zr. (B) 

The widely used ILU smoother, or incomplete Cholesky smoother introduced by Van der Vorst [10] 
and Meijerink and Van der Vorst [15], is almost of this type, but not quite, for although Z = U^bL* 1 
is implicitly local and can be applied at much the same cost as the other three smoothers described 
below, it has a derivation which differs considerably. The idea behind incomplete Cholesky is to 
follow the Cholesky algorithm for computing the lower triangular factor L and the upper triangular 
factor U with one exception: as each element is computed, it is set equal to zero if it is located outside 
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void PUMG: : solve ( pumg_node *node ){ 
if( NULL == node->upper_node ){ 
node->data_in_u = 1; 

resid( node->r, node->v, node->A, node->u, node ) ; 

} 

else{ 

node->data_in_u =0; 

project ( node->v, node->P, node->upper_node->r , node ); 
copy ( node->r, node->v, node ); 

} 

smooth ( node->u, node->Z, node->r, node ); 
resid( node->r, node->v, node->A, node->u, node ) ; 
for( int i=0; i<node->num sub g rids; i++ ){ 

PUMG: : solve ( & (node->lower_node [i] [0] ) ) ; 

} 

if( 0 < node ->num_sub_gr ids ){ 

resid( node->r, node->v, node->A, node->u, node ); 

} 

smooth( node->u, node->Z, node->r, node ); 
if( NULL != node->upper_node ) 

interp ( node->upper_node->u, node->Q, node->u, node ); 

} 


Figure 1. The main loop of PUMG. 

of the prescribed neighborhood N of the identity. In the earliest version this was taken to be exactly 
the nonzero set of the sparse matrix A, and in later versions this was enlarged for difficult problems, 
to avoid zeroing our elements of significant size. 

The ultimate goal of the approximate inverse smoother Z is to minimize the spectral radius of 
the whole multigrid cycle, with the sparsity pattern of Z as the only constraint. Although it is easy 
enough to construct such an optimal Z for constant coefficient periodic problems, as demonstrated in 
[14], it becomes rather expensive for general unstructured grids. A much less costly approach is to 
focus on the smoother step alone, and construct Z so that 

||(/ - AZ)r\\ (9) 

is small for all r of high spatial frequency. More precisely, we would like to minimize the m ax imum 
of this expression as r varies over the null space of the projection operator P. Alternatively, one could 
focus on the errors rather than the residuals, and seek Z such that 

\\(I-ZA)e\\ (10) 

is small for all e such that Ae is in the same null space. These are global optimization problems, 
however, and expensive enough that we would prefer a local alternative. We describe three effective 
alternatives below. 
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The LS Approximate Inverse 


The first of these, the LS approximate inverse Z, is constructed explicitly as the minimum of the 
quadratic functional 


M(Z) = ||/ - AZ\\% = (I - 


hj 


Y ((i - Z H A H Yl - AZ)y . 


( 11 ) 


(the square of the Frobenius norm) subject to the constraint that Z must vanish outside chosen 
neighborhood M of the identity. The quadratic functional M has the property that the minimizing Z is 
easily computed one column at a time. To see this, let z k denote the k th column of Z, let y k = Ax k , 
and let N k denote the k th column of N, namely the set of indices i such that zf (or Z h k) is allowed 
to be non-zero. Then 


M{Z) = Y Mk (Z), 


( 12 ) 


where 




\Z) = Y, | («U - Vi ) 


03) 


Thus to construct the optimal Z we only need to choose z k to minimize M k (Z) and do so for each 
k. But this optimal z k satisfies the system of equations 

Y b ‘A = a ^, (14) 


B„ 


hj 


where 

S{A H A) itj if ieN k and j e N k (15) 

1 0 otherwise 

If we wish to focus on the errors, rather than the residuals, and ask that Z minimize ||/ — ZA\\ T subject 
to the same sparsity constraints, we find that the rows of Z satisfy essentially the same equation, but 
with AA h replacing A H A in the definition of the small matrix B, and with A rather than A H on 
the right hand side. 


The LS approximate inverse was introduced by M. W. Benson in his 1973 thesis, referenced in 
[7], and has served as an effective smoother in the v-cycle multigrid algorithm FAPIN for twenty 
years now. (See [8] and [1] and the reports referenced therein.) 


The DB Approximate Inverse 

When the sparse matrix A is self adjoint there is another approximate inverse that is even easier 
to compute because it doesn’t involve forming blocks of the normal matrix A H A; in most situations 
this approximate inverse works almost as well as the LS approximate inverse in a defect-correction 
smoother. The idea is to weight the quadratic functional M(Z) with A~ 1 : 

M(Z ) = Y (( 7 “ Z h A h ^A~ 1 {I - AZ)^j . . = Y ( J ~ zHa - + zH AZ ) . , (16) 

i ’ i 

and observe that the columns of the optimizing Z now satisfy the system of equations 

Y A i,i z i= 6 i*’ ieNk ■ ( 17 > 

3 

This is also discussed in [7] and the reports listed therein. 
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The Quadrature Based Smoother (QBS) 


For an even more effective smoother we recommend the quadrature based smoother introduced 
by Baumgardner and Frederickson at the 1993 Copper Mountain conference. The idea of behind QBS 
is to minimize a quadratic functional of the form 

^(Z) = ^Ci||(I-ZA)e f -||i, (18) 

i 

where the carefully chosen set of errors e t , together with the weight c l associated with each, are chosen 
to represent the expected error before smoothing. In particular, they are chosen so that the associated 
residuals r t = Ae; span the null space of the projection operator P. This quadratic functional may be 
viewed as quadrature approximation to an energy integral of the form 

T(Z) = J ||(I - ZA)e||i<«e), 

£ 2 

where the measure p is chosen to represent the energy in the error just before smoothing. Indeed, 
they may be put exactly into this integral form by the simple expedient of choosing the measure p 
to vanish except at the points e; and giving it the value c t there. It is this viewpoint that allows us 
to refer to the errors e» as quadrature points in £2 • 

Because these quadratic functionals are positive definite, they each have a unique minimum when 
restricted to the space of sparse matrices Z of given sparsity structure. In every case the algorithm 
for constructing this optimum Z is entirely local, because the first variation 8T is a block diagonal 
matrix, each block of which involves only one row of Z and nearby rows of A. To be specific, the 
(sparse) row z = z; of the sparse matrix Z satisfies an equation of the form 

Wz = b, (19) 

in which W and b are constructed as follows. For each of the e&, evaluate = s^Ae^, where s j. is 
the sparsity of the row e*. Let e* denote the element of the vector in that row. Then 

W = ^2 c kYky T k and b = Y^ c kYkek- (20) 

k k 

Because Z is sparse this system is easily solved, for the order of the system is the number of nonzeros in 
a row of Z. We have found that by choosing the quadrature points and the weights c* appropriately 
we are able to construct a nearly optimum smoother. In very difficult situations, with strongly varying 
coefficients and/or cell sizes in some localities, we allow Z to fill a larger neighborhood of the identity 
in these localities in order to obtain a spatially uniform rate of convergence. All in all, our best 
smoother for these problems is QBS. When there are few enough of e*, this least squares approach 
reduces to an exact solve, and the resulting smoother anihilates these errors exactly. In this case 
we might also refer to Z as a collocation approximate inverse by analogy with other collocation 
algorithms in numerical analysis. 
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CONCLUSION 


We find that higher order interpolation, important to faster convergence in an unstructured multigrid 
algorithm, can be effectively constructed using the polynomial reconstruction algorithm of Barth 
and Frederickson. Tree structured multigrid algorithms are an additional means of gaining faster 
convergence in unstructured multigrid. The most efficient smoothers for unstructured problems of this 
sort are the quadrature based smoothers indtroduced in 1993 by Baumgardner and Frederickson. 

We would like to conclude with a note of thanks to Craig and Marietta Douglas for the bibliography 
[11] which they have constructed for our use, and to Steve McCormick for organizing this sequence 
of conferences. The catalytic effect on multigrid research of both of these efforts is clear to all of us. 
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A CELL-CENTERED MULTIGRID ALGORITHM 
FOR ALL GRID SIZES* 


Thor Gjesdal 

Christian Michelsen Research A/S 
N-5036 Fant.oft, Norway 


SUMMARY 


Multigrid methods are optimal; that is, their rate of convergence is independent 
of the number of grid points, because they use a nested sequence of coarse grids to 
represent different scales of the solution. This nesting does, however, usually lead 
to certain restrictions of the permissible size of the discretised problem. In cases 
where the modeler is free to specify the whole problem, such constraints are of lit- 
tle importance because they can be taken into consideration from the outset. We 
consider the situation in which there are other competing constraints on the resolu- 
tion. These restrictions may stem from the physical problem (e.g., if the . discretised 
operator contains experimental data measured on a fixed grid) or from the need to 
avoid limitations set by the hardware. In this paper we discuss a modification to 
the cell-centered multigrid algorithm, so that it can be used for problems with any 
resolution. We discuss in particular a coarsening strategy and choice of intergrid 
transfer operators that can handle grids with both an even or odd number of cells. 
The method is described and applied to linear equations obtained by discretisation 
of two- and three-dimensional second-order elliptic PDEs. 


INTRODUCTION 


Multigrid methods have during the last decades developed into an important tool 
in many areas of scientific computation. Because they use a nested sequence of grids 
to represent different scales of the solution, the multigrid algorithms are optimal in 
the sense that their computational complexity is linearly proportional to the total 
number of unknowns in the discretised problem. This nesting does however lead to 
certain restrictions on the permissible size of the discrete problem, similar to those 
encountered in other efficient ‘divide-and-conquer’ algorithms such as the fast Fourier 
transform or cyclic reduction. In a standard multigrid algorithm, coarsening is usually 

*This work was supported by the Research Council of Norway through Grant number 100556/410 
and program number STP-30074. 
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performed by doubling the mesh-spacing. The number of cells in the grid will then 
be of the form n; = C)2 fc , where n; is the number of cells in direction l and C[ is some 
suitable (small) integer. Early applications of multigrid methods for general grid sizes 
consisted of padding the fine grid with empty cells. Such padding can lead to potential 
large overheads in storage requirements and computational complexity. Dendy [1] and 
Adams [2] have both described modifications of vertex-centered multigrid algorithms 
that are extended to handle general grid- sizes. In cases with an odd number of cells, 
Dendy employs a dummy point on the coarse grid, while Adams has devised a special 
coarsening strategy using a uniform grid spacing at all levels. 

Often the modeler is free to specify the whole problem, and then such constraints 
are of no importance because they can be taken into consideration from the outset. 
We consider here the situation in which there are other competing constraints on 
the resolution. These restrictions may stem from the physical problem (e.g., if the 
discretised operator contains experimental data measured on a fixed grid) or from 
the need to avoid limitations set by the hardware. We believe that these restrictions 
must be overcome if the multigrid methods are ever to become a standard inventory 
in the modeler’s toolbox. 

In this paper we discuss a modification to the cell-centered multigrid algorithm, 
so that it can be used for problems with any resolution. The cell-centered algorithm 
is attractive because cell-centered discretisations are in widespread use, and cell- 
centered multigrid also has the ability to handle problems with discontinuous or 
rapidly varying diffusion coefficients using standard grid transfer operators [3, 4, 5]. 

In the next section we will describe the method with special emphasis on the grid 
coarsening and construction of the intergrid transfer functions. We will apply the 
method to linear equations obtained by discretisation of two- and three-dimensional 
second-order elliptic PDEs and show that the convergence rates are indeed indepen- 
dent of the grid size (even grids with an odd number of cells). 

MULTIGRID ALGORITHM 
Two level algorithm 

To describe the method, we will consider a two- level algorithm for the discretised 
problem 

Au = b, 

where A is the discretised differential operator, which we assume is linear. The 
two-level algorithm consists of a smoothing step and a correction step where the 
update to the solution is calculated on a coarse grid. The two components of the 
multigrid algorithm are complementary; that is, smoothing is used to reduce high 
frequency error components, while the coarse grid correction is good at eliminating 
low frequencies in the error. We will denote coarse grid quantities by an overbar, and 
we can then write the algorithm in symbolic form as 

M = S U2 {I - PA~ 1 RA)S U \ 
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Level 5 4 3 2 1 

nx x ny 64 x 64 32 x 32 16 x 16 8 x 8 4 x4 

65 x 17 33 x 9 17 x 5 9x5 5x5 

Figure 1: Multigrid hierarchy for five-level system to illustrate grid coarsening strat- 
egy. 

where M is the two-level error reduction operator; R, P are the restriction and pro- 
longation operators, and S is the smoothing operator with V\,V 2 the number of pre- 
and post-smoothing sweeps, respectively. We obtain the multigrid algorithm by re- 
cursive application of the two-level algorithm to solve the coarse level defect equation 
Ae = Rr = R(b — Au). 

Grid coarsening 

For a given fine grid, we choose the coarse grid size as 

n\ = |_rq/2j + mod(n;,2) (1) 

where |_-J is the floor function. Standard coarsening, or coarsening in all coordinate 
directions, is performed as far as possible. For rectangular grids, semi- coarsening is 
then continued until the coarsest grid has a small number of cells in each direction. In 
this way, a coarse grid hierarchy is defined for any fine grid, and multigrid iterations 
can be performed. To illustrate the coarsening strategy, figure 1 shows an example 
of two five- level systems. The first example shows the standard case with a suitable 
number of cells and full coarsening in four levels. In the second example we have ‘bad’ 
numbers (an odd number of cells in both directions and a moderately rectangular 
grid). Then we apply full coarsening for two levels and continue semi- coarsening for 
two levels to obtain a small system on the coarsest grid. 

Transfer operators 

In this section will we describe the restriction and prolongation operators. For 
simplicity we will concentrate on the one- dimensional operators. We will then describe 
briefly how we obtain the higher dimensional operators. 

The grid transfer operators must satisfy the well-known accuracy requirement 

mp + mp > 2M, 

where tur and mp are the order of the restriction and the prolongation, respectively, 
and 2 M is the order of the differential operator. The order of the grid transfer opera- 
tors and this rule can be determined either by considering how the interpolation acts 
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on the Fourier components (Brandt [6], Hemker [7]) or the order of the polynomials 
used in the interpolation rule (Hackbusch [8]). If we consider second order elliptic 
operators, we will use a restriction based on linear interpolation, which gives ra# = 2, 
and for the prolongation we will use piecewise constant interpolation (mp = 1)- This 
seems to be more robust than the opposite alternative (mp? = l,mp = 2) [9]. 

Prolongation, or coarse-to-fine interpolation, is performed by cell-based piecewise 
constant interpolation; that is, the coarse grid function values are transferred directly 
to the fine-grid points that belong to each coarse grid cell. 

The fine-to-coarse restriction is defined by the average 

Hi = ( Ru)i = ^2R(i,j)u ai+j , a = [n/n] . (2) 

j 

The one-dimensional restriction operator is given by the adjoint of linear interpola- 
tion. In the standard case, where n is an even number, this restriction is simply given 
by the stencil 

R = | [ 1 3 3 1 ] . 

In general, when n is either odd or even, we can envisage two methods to construct the 
restriction operator. First, we can adopt an approach akin to that of Adams [2] and 
assume that both the coarse and the fine computational grids are given as a uniform 
distribution of cells on the unit interval, with spacing h = 1/n. The restriction 
weights can then be calculated by 

R(i, j) = max jo, 1 — (2 i + j — \)h — (i — \)h) fh | . (3) 

This will give a three- or four-point stencil in all cases. These restriction weights 
should be scaled so that they add up to the ratio n/n [10]. If the fine grid has an 
even number of cells, this formula will reproduce the standard stencil. 

This approach will unfortunately not produce a stable coarse-grid operator when 
we use the Galerkin approximation, and as a consequence the convergence rate of the 
method will deteriorate. We have therefore instead developed a restriction operator 
based on true cell-based coarsening. In this case, one cell at the boundary will be 
identical to the boundary cell at the finer level, as illustrated in figure 2. A similar 
aproach was suggested by Hutchinson and Raithby [11] in connection with the use 
of a low-order restriction operator. For a restriction based on the adjoint of linear 
interpolation we must modify the stencil in the cells close to the boundary. We get 

R(n- 1,:) = [||l-u>0], 

R(n,:) = [ w 1 0 0 ], 

where w = a/b. If & is the number of immediately preceding finer levels that has an 
odd number of cells, a and b are given by 

o,\ — 1 t ik — 2ak-i, 

h = 3 b k = 26 fc _x - 1. 
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Fine Grid 



Coarse grid 


Figure 2: Fine and. coarse cells at boundary when the fine grid has an odd number of 
cells. The numbers indicate the restriction weights for the end point. 


When semicoarsening is used, a direction exists in which n = n. In this case 
a = 1 , and both the restriction and the prolongation are given by the identity operator 
/ = [ 0 1 0 ]. 

In the multidimensional case, the stencils for restriction and prolongation are 
determined by tensor products of the one- dimensional stencils. If we let i and j 
denote multi-indices, we will for example have for the restriction stencil in 3D 

fi 3D (i,j) = &(',,*)&('!, h)R‘(i 3 ,h). (4) 

In other words, in two- and three dimensions restriction will be given by the adjoint 
of bi- and tri-linear interpolation, respectively. 


Coarse grid approximation 


There coarse grid matrices are determined by the use of the Galerkin approxima- 
tion A = RAP. The Galerkin coarse grid approximation is preferable to straightfor- 
ward discretisation, because the coarse grid operator can be automatically calculated 
from the fine grid stencils. This can give the multigrid solver the appearance of a 
black-box solver where the user only has to supply the coefficients of the discretised 
equations. 

Because the restriction operator is based on bi- and tri-linear interpolation in 
higher dimensions, the coarse grid stencil will be full (9 points in 2D, 27 points 
in 3D.) The stencil elements can readily be calculated by the algorithm given by 
Wesseling [10]. 


Smoothing 

A point that should be noted is that the coarsening strategy we described in the 
previous section may change the (an-)isotropy of the coarse-grid operator compared 
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to the operator on the fine grid. This might have to be taken into consideration when 
we select the smoother. In two dimensions the alternating line Gauss-Seidel method 
is a robust smoother that is not too expensive. In practice its performance is often 
quite comparable to Red-Black point relaxation even for isotropic operators. In 3D, 
the only really robust smoother is alternating plane relaxations, which unfortunately 
is rather expensive even if a multigrid method is used, to solve the two-dimensional 
planes. It is therefore difficult to recommend this smoother without reservation. If 
we have enough knowledge of the problem at hand to decide that a line relaxation 
method would suffice, a potential gain can be harvested, but for a truly black-box 
solution plane relaxation is probably the safest bet. 


Implementation aspects 


In this section we will discuss briefly some practical aspects of the algorithm. The 
use of one-dimensional interpolation rules makes the implementation fairly modular, 
and by using features such as derived data types and dynamic memory allocation that 
axe now available in Fortran we have written a combined two- and three-dimensional 
solver where the PDE can be discretised on any compact stencil (the most co mm on 
axe 5, 7, or 9 points in 2D and 7, 12, 19, or 27 points in 3D). With the use of recursion, 
it is also possible that the solver calls itself for plane smoothing in 3D problems. 

Depending on how restriction and prolongation are treated, a modest overhead re- 
lated to the transfer operations will be realized. In our implementation, this overhead 
is on the order of nx x ny x nz floating point operations (roughly equivalent to one 
residual calculation on the fine grid) per iteration and nx+ny + nz memory locations. 
The overhead related to work can however be eliminated if all stencil elements are 
precomputed and stored. This option will of course lead to a larger storage penalty. 

COMPUTATIONAL EXAMPLES 

In this section we will demonstrate the convergence of the method for some selected 
test examples. 


Laplace/Poisson equation 


First we consider the Laplace equation on rectangular, regions with Dirichlet or 
Neumann boundary conditions. 


V 2 u 

u 

du 

dn 


0, x € fl 

1, x € Oil or 

0, x € dCl 
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Table 1: Two-Dimensional Laplace Equation with Dirichlet Boundary Conditions 


Grid size 

Levels 

Reduction factor 

Iterations 

31 x 31 

4 

0.051 

10 

32 x 32 

4 

0.052 

10 

33 x 33 

4 

0.052 

10 

50 x 50 

5 

0.050 

10 

63 x 63 

5 

0.074 

11 

64 x 64 

5 

0.053 

10 

65 x 65 

5 

0.059 

10 

99 x 99 

6 

0.061 

10 

150 x 150 

6 

0.053 

10 

65 x 33 

5 

0.058 

10 

101 x 50 

6 

0.053 

10 


2D and 3D calculations with a uniform fine grid 

The first set of calculations is performed on the Laplace equation with Dirichlet 
boundary conditions in a case in which the grid spacing is the same in each direction, 
so that the fine-grid operator is isotropic. The iterations start off from random initial 
values in the unknowns and are performed until the residual norm is reduced by a 
factor of 10 -12 . The average residual reduction rate, k , is defined as 


k = 



1/n 


Table 1 shows results of two-dimensional calculations using alternating line Gauss- 
Seidel as the smoother for a series of different grid sizes. The results are given for a 
V(0,1) (sawtooth) cycle. We see from the table that the method works equally well 
for problem sizes that include both ‘good’ and ‘bad’ multigrid numbers. 

Results of three-dimensional calculations are given in table 2. The results indicate 
that the alternating line smoother is sufficiently robust to handle cases in which either 
odd-numbered grids or semi- coarsening lead to anisotropy in the coarse grid problems 


Table 2: Three-Dimensional Laplace Equation with Dirichlet Boundary Conditions 


Grid size 

Reduction 

Line GS 
Iterations 

time 

Plane GS 

Reduction Iterations 

time 

31 x 31 x 31 

0.036 

9 

0.93 

0.016 

7 

5.36 

32 x 32 x 32 

0.031 

8 

1.00 

0.015 

7 

5.67 

33 x 33 x 33 

0.031 

8 

1.07 

0.029 

8 

7.73 

32 x 15 x 19 

0.034 

8 

0.29 

0.014 

7 

1.81 
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Table 3: Two-Dimensional Laplace Equation with Homogeneous Neumann Boundary 
Conditions on a Stretched Grid 


Stretching factor: 

1.0 

1.2 

100 

Grid size 

K 

n 

K 

n 

K 

n 

8x8 

0.139 

8 

0.136 

8 

0.024 

5 

16 x 16 

0.169 

9 

0.153 

8 

0.067 

7 

32 x 32 

0.210 

9 

0.221 

9 

Diverge 


as long as the problem on the fine grid is isotropic. The dramatic slow-down seen 
in the case where we use alternating plane relaxation may be caused by the start-up 
overhead of the multigrid solver. A way to alleviate this might be to rework the 
plane-smoother to precompute the coefficients in all the planes. This will however 
lead to a considerable storage overhead. Another alternative is to investigate whether 
the three-dimensional coefficients that are already computed can be used also in the 
plane solver. 

Examples with a non-uniform grid 

In this section, we will study the effect of anisotropy by introducing a nonuniform 
grid. Botta and Wubs [12] have shown that solution of partial differential equations 
on a stretched grid can pose a challenging test case for iterative methods. One of 
their test cases consists of the two-dimensional Laplace equation on the unit square 
with homogeneous Neumann boundary conditions. The initial field is given by / = 
x 2 (l — y) 2 , and the convergence criterion used is that the absolute error should be 
below 10 -6 . The grid is generated by geometric stretching so that s = /i, +i / hi is 
constant. We use a V(0,1) cycle and the alternating line smoother; the results are 
shown in table 3. We note that the method fails to converge for large values of the 
stretching factor. The experiments indicate that a critical stretching factor exists 
depending on the grid size, and that iterations will diverge if the stretching is greater 
than this limit. In practice will we however only encounter moderate stretching, 
because an appreciable loss of accuracy occurs even for stretching factors larger than, 
say, s ~ 5/4. 

We also performed the same experiment in a 3D case with a 32 s grid, and we 
noticed that for moderate stretching rates, s < 2, we obtained essentially no degra- 
dation in the convergence using the alternating line smoother For s = 5, we did, 
however, notice a significant slow-down as expected. 


Stone’s problem 


This problem was introduced by Stone [13] as a test case for the Strongly Implicit 
Procedure (SIP), which is a relaxation method based on an incomplete LU decompo- 
sition. The problem consists of a heterogeneous diffusion problem on the unit square 
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Figure 3: Geometry for Stone’s model problem. 

given by 

V- diag (K x ,K y )Vu = /, (x, y) <E [0, l] 2 , 

ft ■ Vu = 0. 
r 

The geometry of the problem, specifying the conductivities and the sources, is de- 
picted in figure 3. This problem was solved on a grid with 30 x 30 cells, using 4 levels 
in the multigrid iterations. The initial field was identically zero. The convergence 
factors for this problem are given in table 4. 

SIMPLE pressure correction equation 

The pressure correction equation in the SIMPLE algorithm for solution of the 
incompressible Navier- Stokes equations can be interpreted as an elliptic operator in 












Table 5: Pressure Correction Equation in the Two Backward Facing Step Examples; 
Results for the First 20 Outer Iterations 


Iteration 

BFS 

K 

n 

BFS-POR 
k n 

1 

0.100 

9 

0.125 

10 

2 

0.101 

9 

0.253 

15 

3 

0.096 

8 

0.267 

15 

4 

0.090 

8 

0.234 

13 

5 

0.081 

8 

0.174 

11 

10 

0.080 

7 

0.185 

11 

20 

0.096 

8 

0.179 

10 

Average 

0.089 

0.191 


the form 

DfrGp') = Du, 

where D and G are discretised divergence and gradient operators, respectively, p' is the 
pressure correction and u is the intermediate velocity field. The diffusivity 7 consists 
of the inverse of a diagonalisation of the momentum operator and geometric terms. 
The equation is usually employed with homogeneous Neumann boundary conditions. 
The multigrid solver has been used to solve the pressure correction equation for the 
test case of a backward-facing step at Re = 800. The step size was half the channel 
height, and the channel had a length of 30 step heights. The grid had 30 x 10 cells, 
which gives a grid aspect ratio of 5 : 1. 

In the first test, the standard set-up from the benchmark results of Gartling [14] 
was used. In order to assess the effect of the use of porosities on convergence rates, 
we performed a second test in which a thin vertical plate with zero porosity and a 
height equal to the step height was placed in the channel downstream of the two main 
recirculation bubbles. Table 5 shows the residual reduction rates for the multigrid 
solver in the two examples. The results show that even though the convergence of the 
solver deteriorates in the case for which we have zero porosity (and as a consequence, 
zero diffusivity 7), its performance is still good. 

CONCLUSIONS 


We have described a generalisation of the cell-centered multigrid algorithm to 
cover problems with general resolution. Smoothing and the multigrid scheduling are 
not affected by the extensions, but changes have been made in the grid coarsening 
strategy and, consequently, in the design of the intergrid transfer operators. We found 
that cell-based coarsening was better than a mapping of the coarse grid to a uniform 
mesh. Numerical experiments show that the solver gives multigrid convergence for all 
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grid sizes in a number of test cases. The alternating line Gauss-Seidel relaxation is a 
good smoother for the two-dimensional solver. Its performance was also satisfactory 
in some 3D problems. In some cases involving extreme grid stretching the method 
seems to fail. This failure is, however, of small practical importance. 
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Abstract 

A robust solver for the elliptic grid generation equations is sought via a numerical study. The 
system of PDEs is discretized with finite differences, and multigrid methods are applied to 
the resulting nonlinear algebraic equations. Multigrid iterations are compared with respect- 
to the robustness and efficiency. Different smoothers are tried to improve the convergence of 
iterations. The methods are applied to four 2D grid generation problems over a wide range 
of grid distortions. The results of the study help to select smoothing schemes and the overall 
multigrid procedures for elliptic grid generation. 


INTRODUCTION 


Numerical grid generation arose from the need to compute solutions of partial differential equations 
defined over physical domains with complicated geometry. By transforming a physical domain to a 
simpler computational region (e.g., a square or a cube), the complication of the shape of the domain 
is removed from the problem. Although the transformed PDEs over the simple region are usually 
more complicated, they are easier to discretize with finite difference or finite volume methods. The 
domain transformation can be viewed as an introduction of a general curvilinear grid on the original 
domain. This explains the name: grid generation. 

The basic grid generation problem can be formulated in the following way: given a physical 
domain fi G R d , a computational domain U G R d , and a nonsingular parametric mapping <9x of the 
domain boundaries 

5x : dU -4 dn 


extend this mapping to a mapping 

x : U — > D 

from the computational region to the physical domain. Here d denotes the dimension of the space 
containing Q, e.g., d = 2 describes planar problems. Such a mapping x is called a boundary- 
conforming map, and the map generates a boundary-conforming curvilinear grid in domain Q. 

Elliptic grid generation is one of the popular methods for constructing boundary conforming grids. 
It constructs the grid mapping x as the solution of a system of elliptic partial differential equations 
defined on U subject to the boundary condition satisfying x(SD) = dCt. A major advantage of this 
approach is that the curvilinear grid in Q is smooth, which results in small truncation errors in finite 
difference discretizations of transformed differential equations. A major disadvantage is that the 
grid construction itself involves a numerical solution of a system of quasi-linear elliptic PDEs and 
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requires much longer execution times than other types of grid generation (algebraic, parabolic, or 
hyperbolic). 


ELLIPTIC GRID GENERATION AND DISCRETIZATION 


A general elliptic grid generation system of equations can be written as 


Tx = F, 


( 1 ) 


with Dirichlet boundary conditions x(dU) = dQ, where T is a second order, quasi-linear, elliptic 
differential operator and F is the inhomogeneous part of the system (usually a first order differential 
operator) . 

A widely used elliptic system for grid generation (see [2]) is the inhomogeneous Thompson- 
Thames-Mastin (ITTM) generator given by the following equations (in two dimensions) : 


( 0 2 „ 8 2 d 2 \ 

[ 92 W~ 912 d&j +9ll W) 


x = -g22pxe - gag**,* 


( 2 ) 


where gu = x ? -x^ , g i2 = x^-x,, g 2 2 = x,, -x, are the elements of the metric matrix and p, q are user 
defined functions which allow some measure of local control of grid cells. This system, together with 
appropriate Dirichlet boundary conditions, defines the mapping x : U — ¥ Q from the computational 
to the physical domain. For the derivation of the ITTM equations and other examples, see [2, 3]. 

Equations (2) may be solved numerically via standard central finite differences on a uniform, 
Cartesian grid in the computational domain U. Due to the presence of mixed derivatives, the nine- 
point stencil must be used. Grid points are indexed lexicographically by a two-tuple of integers 
i = (*1 , *2), *1 = 1,2, ...,M, %2 = 1,2,..., N. Let X = (X^X 2 ), where X,- is the approximation 

of x(£ tl , J)i 2 ), be blockwise ordered. Orderings used in multigrid smoothers may be different and are 
specified with the smoothers. In blockwise ordering we have (■ • ■ , X} , X/ +1 , • • • , X 2 , X 2 +1 ,•■•). The 
discretization results in the following nonlinear algebraic system: 


(G 22 (X)(D n + PD X ) - 2G i2 (X)D 12 + Gii(X)(£> 22 + QD 2 )) X = F, (3) 


where Gki are diagonal matrices with symmetric finite difference approximations of the metrics 
gki evaluated at the nodes indexed by the two-tuples i. The matrices Du, D\ 2 , D 22l D\, and D 2 
represent symmetric finite difference operators approximating the derivatives; P and Q are diagonal 
matrices corresponding to the user defined functions p and q, and F is a vector containing the values 
of the mapping x at the boundary nodes. 

The nonlinear system (3) can be solved iteratively in one of several ways. The Newton iterations 
typically converge faster for problems with mild grid distortions but lead to singular Jacobians and 
divergent iterations for strong distortions. Lagging the metric coefficients Gki produces a more 
robust but slower solver. At each step a linear system is to be solved of the form 

(G 22 (X")(D u + PDr) - 2Gi 2 (X")D 12 + G n (X")(D 22 + QD 2 )) X n+1 = F, (4) 

where G*/(X") denotes diagonal matrices with symmetric finite differences approximations of the 
metrics gki at (X"). 

A simpler nonlinear iteration also lags the P and Q terms yielding the system 
(G 22 (X")D n - 2Gi 2 (X")D 12 + Gu(X")D 22 ) X n+1 = F - ( G 22 (X n )PD 1 + G u (X n )QD 2 ) X n . 
Using blockwise ordering both of these systems can be written in a block diagonal form 


AX = b, 


( 6 ) 
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where 



At each step of the iteration the sparse nonsymmetric linear system (6) is solved using a linear 
multigrid method with a V-cycle. The initial guess X° is provided by an algebraic grid generation 
algorithm [2, 3]. 

In this paper we study the performance of various multigrid smoothers for equations (4) and 5) . 

MULTIGRID 


In the multigrid solution of linear systems (equations (4) and (5) we have looked at 17 smoothers. 
Seven of them were point Gauss-Seidel with various ordering: horizontal forward (PHF), vertical 
forward (PVF), vertical backward (PVB), horizontal symmetric (PHS), vertical symmetric (PVS), 
alternating forward (PAF), and alternating symmetric (PAS). Nine other smoothers were line Gauss- 
Seidel variants: horizontal forward (LHF), vertical forward (LVF), vertical backward (LVB), hori- 
zontal symmetric (LHS) , vertical symmetric (LVS), alternating forward (LAF), alternating backward 
(LAB) , alternating symmetric (LAS) , and alternating forward zebra (LAFZ) . The last method con- 
sidered was the point incomplete LU factorization smoother (PILU) . Obviously the above smoothers 
have different complexity counts per one iteration. Clearly, PVF and PVB have the same complex- 
ity as the PHF smoother; PHS, PVS, and PAF require twice as many computations, and the PAF 
smoother is 4 times as costly. The complexity of line smoothers is as follows: LHF, LVF and LVB 
smoothers require 11/9 of PHF computations, the LHS, LVS, LAF, LAB, and LAFZ smoothers cost 
22/9 times more, and the LAS smoother takes about 4.5 times as long. 

The linear multigrid algorithm used the V-cycle with single pre-smoothing and single post- 
smoothing at each level. The coarsest grid was as coarse as possible: it consisted of one internal 
point (and eight on the boundary). The Galerkin coarse grid approximation was used, and a direct 
solver applied on the coarsest level. The prolongation operators were bilinear, and the restriction 
operators were scaled adjoints of the prolongation operators. 


NUMERICAL EXAMPLES AND RESULTS 


To measure performance of the algorithms, the reduction factors were used. At each step of nonlinear 
iterations, n multigrid V-cycles were applied. Define r = ||R||/ ro b /V 3 , where R = b — AX is the 
residual of equation (6), and || • ||/ ro 6 denotes the Frobenius norm. Denoting the norm of the residual 
after the i-th V-cycle by r,- we define the i-th reduction factor pi by 

Pi = n/n- 1 

and the average reduction factor by 

p=(r„/r 0 ) 1 /". 

The four test problems, the square, the trapezoid, the L-shaped domain (backstep), and the 
airfoil, were chosen from the ’’grid gallery” given in [2]. The domains and some 16x16 curvilinear 
grids generated by the ITTM equations are given in figure 1. 
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Figure 1. Grids generated on various 16x16 domains: 
and AIRFOIL with elliptic generators. 


SQUARE, TRAPEZOID, BACKSTEP, 


The first and second problems are very simple with small mixed derivatives. Problems three 
and four are harder due to high values of mixed derivatives in some parts of the computational 
domain. The grid control functions p and q in the ITTM equations were chosen so that the grid cells 
were concentrated at the “bottom” of the domains. In the case of the airfoil the physical domain 
is doubly connected. A cut emanating from the tail of the airfoil enables it to be mapped onto a 
computational square. The bottom of the square corresponds to the surface of the airfoil. 

We performed a series of numerical experiments for all four test problems by varying the param- 
eters of the control functions. First, the reduction factors of 25 V-cycles with different smoothers 
were measured for the first nonlinear iteration. To determine the best performance, the relaxation 
parameter was varied from 0.1 to 1.9 by 0.1. Tables 1-4 give the reduction factors from the first 3 
V-cycles and the average reduction factor for all 17 smoothers. The listed results are for grid control 
functions p = 0 and q(£, rj) = —5 exp(-5??) in the ITTM equations, but the results are typical for 
a wide range of control function parameters. The multigrid iteration terminated after 25 V-cycles, 
or after machine accuracy was reached, whichever occurred first. The last (“asymptotic”) reduction 
factor p as for each smoother was also given. If the last 3 reduction factors differ by less than 0.0005, 
then the value of p as was marked with an asterisk. 

From these tables, we make the following observations: 
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• All point and line Gauss-Seidel smoothers work for all the test problems. The “optimal” 
relaxation parameters u vary significantly as the problem changes, but all of them turn out to be 
larger than 1 (overrelaxation). However, the multigrid iterations converged for every tested value of 
0.1 < to < 2. 

• With the PILU smoother, the relaxation parameter is much less sensitive to the changes of 
the problem and the grid control functions. In fact, the best values of ui were contained between 
0.7 and 0.9. However, the PILU smoother was divergent for the airfoil problem (example 4) for any 
value of w. 

• The decrease of the reduction factor obtained from applying symmetric or alternating Gauss- 
Seidel smoothers rather than forward (or backward) ones do not seem to justify the computational 
costs. The reduction factors for the former are larger than the square of the reduction factors for 
the latter. However, the issue of choosing the best ordering for the problem remains. The smoothers 
with similar computational complexities have widely different reduction factors in examples 3 and 
4. 

• Line Gauss-Seidel smoothers perform better than point Gauss-Seidel smoothers of similar 
complexity on more difficult problems (examples 3 and 4) and nearly the same on easy problems 
(examples 1 and 2). 

With the above observations the best smoothers were tested in the full nonlinear iteration for 
examples 3 (backstep) and 4 (airfoil). The iterations described in equations (4), and (5) were im- 
plemented with initial guesses supplied by an algebraic grid generator. The residuals pi after each 
nonlinear iteration were computed using the “exact” values of the metric tensor in the ITTM equa- 
tion. The “exact” metric tensor was computed by running the iterations until machine convergence 
was reached prior to the actual tests. Fifty iterations were performed, unless the process was inter- 
rupted earlier when the residual reached 10“ 10 . The results are contained in tables 5 and 6. The 
line Gauss-Seidel smoothers can be seen to give faster results. 

CONCLUDING REMARKS 

The object of the study, of which the preliminary results are reported here, was to select the most 
robust smoothers for multigrid in elliptic grid generation. Since the shape of the physical domain in 
the grid generation and the control grid functions usually induce elliptic grid equations with sharply 
varying coefficients (and possibly large convective terms), the optimal smoothers may be found from 
among the line ILU methods. We plan to investigate this possibility next. Also we are working on 
a study of smoothing for the full approximation storage (FAS) [1, 4] to apply multigrid directly to 
the nonlinear system (3). 
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Table 1: 128x128 SQUARE Reduction Factors with “Optimal” Relaxation 


Smoother 

UJ 

Pi 

P2 

P3 

l 

Pas 

p 

PHF 

i.i 


0.152 

0.155 

11 

0.144* 

0.151 

PVF 

1.2 


0.149 

0.123 

12 

0.239* 

0.164 

PVB 

1.1 


0.175 

0.192 

13 

0.202* 

0.189 

PHS 

KU 

0.084 



8 


0.071 

PVS 

m 

0.084 



8 


0.071 

PAF 

1.4 

0.065 


0.059 

7 

0.060 

0.060 

PAS 

1.4 

0.027 

0.021 

0.022 

6 

0.026* 

0.024 

LHF 

1.0 


0.153 


11 

lytMlg 


LVF 

1.0 

1.1:, 1 

0.152 


11 


Rig 

LVB 

1.0 

0.225 

0.152 

toISI 

11 



■ 

1.1 

0.134 


0.052 

8 


0.060 


1.0 

0.065 


0.053 

8 


0.056 

11. ■ 

CO 

r-H 

0.051 


0.026 

6 

1B1WIB 

0.031 

LAB 

1.3 

0.052 


0.028 

B 


0.032 

LAFZ 

1.0 

0.048 

g 

0.049 

B 


0.049 

LAS 

1.4 

0.016 

0.010 

0.010 

5 

0.011 

0.011 


0.9 




6 

0.022 

IH1I 


* “asymptotic” 


Table 2: 128x128 TRAPEZOID Reduction Factors with “Optimal” Relaxation 


Smoother 

U) 

Pi 

P2 

p3 

/ 

Pas 

P 

PHF 

m 



DRI 

ED 

BUI 

0.214 

PVF 

In 


0.200 

MB/ 

m 

in 

0.206 

PVB 

n 


0.194 

HU 

15 

0.250* 

0.227 

PHS 

1.3 




11 

■AMI 


PVS 

1.3 




10 

BIB 


PAF 

1.3 




9 

0.099 


PAS 

1.4 

0.040 

0.037 

0.044 

7 

BfiltMiM 

0.046 


B 

UM 


0.195 

u 


0.216 


in 

IB 

pffpR 

0.171 

EH 

0.186 

0.177 



B 

iiMrl 

0.200 

13 

0.198* 

0.195 

LHS 


0.207 



10 

■SB 

0.101 

LVS 

B 

0.089 


mM 

9 

KS1 

0.086 

LAF 

1.3 

0.067 

0.039 

0.041 

7 

0.047 

0.046 

LAB 

1.3 

0.075 

0.046 

0.049 

8 

0.052* 


LAFZ 

1.1 

0.108 

0.058 

0.077 

9 

0.099 

0.089 

LAS 

1.5 

0.050 

0.015 

0.014 

6 

0.016 

0.019 

PILU 

0.9 

0.059 

0.032 

■iWlW 

9 

0.191 

0.093 


* ’’asymptotic” 
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Table 3: 128x128 BACKSTEP Reduction Factors with “Optimal” Relaxation 


Smoother 

U) 

Pi 

P2 

P3 

/ 

Pas 

P 

PHF 

1.5 

0.356 

0.349 . 

0.467 

24 


Mstm 

PVF 

1.7 

0.551 


0.490 

24 


0.645 

PVB 

1.6 

0.749 

0.428 

0.463 

25 


Bl 

PHS 

1.5 

0.179 

0.281 


25 

0.745 

0.630 

PVS 

1.7 

0.311 

0.394 

0.452 

25 

0.618 

0.549 

PAF 

1.7 

0.276 

0.321 


23 

0.626 

0.543 

PAS 

1.8 

0.243 

0.318 

0.376 

25 

0.459 

0.418 

IppM 

K8I 

0.550 

0.451 

0.536 

II 


0.659 


in 

0.303 

0.258 

0.359 

II 


0.416 

BB3B 

m 

0.313 

0.306 

0.412 

24 

0.547 

0.500 

LHS 

1.7 


0.441 

0.489 

25 

0.613 


LVS 

1.4 

Bps jl 

0.193 

0.188 

22 


I' ?.. I 

LAF 

1.6 

■Sat m 

0.147 

0.199 

18 



LAB 

1.6 

0.194 

0.202 

0.220 

22 

0.358 

0.307 

LAFZ 

1.4 

0.723 

0.104 

0.201 

23 

0.527 

0.465 

LAS 

1.7 

0.081 

0.103 

0.119 

14 

0.236 

0.153 

mmm 

0.7 

0.148 

0.115 

0.115 

14 

0.167 

0.151 


* “asymptotic” 


Table 4: 128x128 AIRFOIL Reduction Factors with “Optimal” Relaxation 


Smoother 

U) 

Pi 

P2 

P3 

/ 

PaS 

P 

PHF 

1.6 

0.856 

0.464 

0.515 

20 

0.683 

0.609 

PVF 

1.6 

0.900 

0.404 

0.433 

20 

0.685 

0.612 

PVB 

1.6 

0.747 

0.509 

0.472 

25 

0.825 

0.684 

PHS 

KSfl 



0.494 

Bl 


mm 

PVS 

RH 


UK si 

0.480 

El 


0.474 

PAF 

1.7 

BlKsl 

0.396 

0.463 



0.478 

PAS 

1.7 

0.221 

0.252 

0.233 

20 

0.406 

0.292 


1.6 


KB 

0.425 

20 

0.712 


1: 

1.1 



0.171 

15 

0.226* 

IpoS 

■li 

1.1 


ESH 

0.171 

15 

0.226 

gjffflf j 

LHS 

■B 


0.513 

wm\m 

— 


'Jir'iK 

LVS 



0.057 

E* ' 1 

11 



LAF 


0.119 

0.123 

EKii 

li 


pill 

LAB 

in 

0.119 

0.124 

0.118 

■ 9 



LAFZ 

KB 

0.298 

0.080 

0.097 

12 


0.123 

LAS 

1.2 

0.041 

0.032 

0.036 

9 

0.087* 

0.056 

PILU 

n/a 

n/a 

n/a 

n/a 

n/a 

n/a 

n/a 


* “asymptotic” 
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Table 5: Comparison of Smoothing Performance in Nonlinear Iterations for 128x128 BACKSTEP 
with Initial Residual ro = 4.30e — 02 


Smoother 

PVB 

PVS 

LVF 

LAF 

Equation (4) 





# iter 

50 

48 

45 

46 

r la$t 

1.12e-10 

5. 01e-ll 

9.10e-ll 

9.65e-ll 

p 

0.674 

0.651 

0.642 

0.649 

Equation (5) 





# iter 

50 

29 

29 

26 

Tlast 

1.32e-10 

5.69e-ll 

4.23e-ll 

6.45e-ll 

p 

0.676 

0.494 

0.489 

0.458 


Table 6: Comparison of Smoothing Performance in Nonlinear Iterations for 128x128 AIRFOIL with 
Initial Residual ro = 5.85e — 02 


Smoother 

PHF 

PVS 

LVF 

LVS 

Equation (4) 






50 

50 

50 

50 


6.49e-09 

8.56e-09 

6.80e-09 

6.65e-09 


0.726 

0.730 

0.727 

0.726 

Equation (5) 





# iter 

50 

31 

30 

28 

r last 

5.15e-09 

9.50e-ll 

4.58e-ll 

5.16e-ll 

p 

0.723 

0.521 

0.497 

0.495 
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SUMMARY 

To solve a given fine mesh problem, the design of a multigrid method requires the defini- 
tion of coarse levels, associated coarse grid operators and inter-grid transfer operators. For 
non-structured simplicial meshes, these definitions can rely on the use of non-nested trian- 
gulations. These definitions can also be founded on agglomeration/aggregation techniques 
in a purely algebraic manner. This paper analyzes these two options, shows the connections 
of the volume-agglomeration method with algebraic methods and proposes a new definition 
of prolongation operator suitable for the application of the volume-agglomeration method 
to elliptic problems. 


1 Introduction 

Unstructured meshes are now a common tool in large scale scientific computing. With 
respect to structured grids, the use of this type of data representation olfers the advan- 
tage of larger flexibility in adapting the mesh to complex geometries and complicated 
solutions. However, this approach also places a larger demand on the design of dis- 
cretisation methods and solution algorithms. ' As a matter of fact, classical solvers 
using the regularity of the mesh may fail or become less efficient on non-structured 
meshes. Among the solvers that have appeared in the last two decades, multigrid 
type algorithms have been among the more successful. These methods were origi- 
nally formulated for structured grids. To run efficiently on non-structured meshes, 
the solution algorithms have to be adapted or re-formulated. In structured MG al- 
gorithms, the building blocks of the methods are the inter-grid transfer and coarse 
grid operators. The main difficulty with these methods is, thus, to adequately design 
these operators. Unstructured multigrid algorithms add the additional difficulty on 
defining the coarse levels. This paper presents some approaches to solving this diffi- 
culty. 

We first consider geometrical methods that explicitly define a hierarchy of grids. The 
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simplest method follows a coarse to fine path and generates fine levels starting from 
a given coarse level. More sophisticated methods generate all levels independently. 
These last methods place an excessive burden on the mesh generator algorithms, and 
we will indicate a possible way to automate them. The third geometrical method we 
will consider is the volume agglomeration MG technique of Lallemand & Dervieux [1] 
based on finite volume discretisations. 

Another possible way to face the difficulty of the generation of coarse levels is to rely 
on a purely algebraic method. These methods can be interesting because they avoid 
the geometrical complexities that make the generation of coarse levels tedious in the 
geometrical approaches. However, these methods are much more difficult to design 
and analyze. We show, however, that any algebraic method can be interpreted as a 
geometrical one and as an example analyze in this setting the volume agglomeration 
MG method. We first show that this method can be viewed as an equation summing 
technique and then use a geometrical interpretation to analyze some of its deficiencies. 
We then propose a possible way to improve this method. 

2 Geometrical methods 

In this section, we consider methods that explicitely define a hierarchy of grids. 

2.1 Nested mesh approach 

Let O be a bounded polygonal domain of the plane, and consider a coarse triangu- 
lation T\ of this domain defined by the set of nodes A f\. For simplicity, we assume 
that this initial triangulation is regular and quasi- uniform with mesh parameter hi. 
Associated with this triangulation, we consider the finite element space of piecewise 
linear functions M\. The spaces Mj will be recursively defined by adding nodes at 
the midpoints of the edges of the triangles of Tj-i and decomposing each triangle 
into four congruent triangles. We observe that the regularity and quasi-uniformity 
constants of the mesh are maintained by this process. Each element of Mj belongs 
to Mj+x and thus we obtain a sequence of nested spaces 

Mi C M 2 C • • • C M n 

such that dim Mj = 4 dim Mj-i- In addition, denoting hj = max{h T }, we exactly 
have hj + i = hj /2. 

To connect the different levels, we need linear operators between them. For this, 
we first- equip each space Mj by an inner product (., .)j defined, for instance by : 

( u , v)j - h? ^2 u(x) v(x) V(«, v ) e M'j 

x&Afj 
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By the quasi-uniformity assumption, this inner product induces a norm denoted by 
II- II j equivalent to the L 2 norm on M r The prolongation operator is simply the 
identity, and the restriction operator Xj +1 can be defined by 

C^H-i u j+ 1) v i)j = i u j+ 1; v j)j + 1 (1) 

l 3 j+1 is then some kind of L 2 projection on Mj. 

We now describe the algebraic counterpart of these definitions. 

For each space Mj , we consider the usual nodal basis {(p j ) defined to be 1 on node 
Xi € A fj and 0 on all other nodes G A/}. The choice of this basis induces a natural 
one-to-one mapping between Mj and IR nj ( rij = dim Mj) which we denote as Fj-. 

71 j 

Fj : u e IR nj -» Y^Ui ^ e Mj 

i~ 1 

For simplicity, each space lR nj ' is equipped with the scalar product 


<u,v >j= tfYjU’i Vi 


i— 1 


such that we have 


<Cu,v >j— (FjU, Fjv)j 


The identity of Mj considered as an operator from into IR nj+1 will be denoted 
by /j +1 defined by 


ry = r,+i Ip' 


in such a way that the following diagram commutes (see Figure 1.). From definition 


< — lR n J+ 1 

T Pj +1 

< — JR’ lj 

Ti 

Figure 1: Commutative diagram defining the algebraic prolongation. 

(1), we see that the algebraic expression of the restriction operator Jj -1 is given by 
the scaled matricial transpose of Ij_i • 



Mj . |-i 
Id T 
Mj 
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Remark 1: For future reference, we note that in the definition of these opera- 

tors, the functional spaces are first defined, and then the algebraic expression of the 
transfer operators are deduced from them. 

The previous method is very simple and exactly fits into the classical variational 
multigrid theory. However, it implies a large dependence of the fine mesh node 
distribution on the coarsest level. Indeed, the mesh division algorithm is a fine mesh 
generation algorithm, and unfortunately this is a very poor one. Thus, the meshes 
generated this way are of poor quality (hence, the fine mesh solution will also be of 
poor quality). Moreover, in many cases, the fine mesh is given, and thus the solvers 
must be able to deal with an arbitrary given fine mesh instead of building it. 

A solution can be to relax the constraint of nestedness of the meshes; this is the 
non-nested approach that we describe now. 

2.2 Non-nested approach 

In this approach coarse and fine triangulation are generated independently using any 
given mesh generators. The solution, residuals, and corrections are transferred back 
and forth through the different levels using linear interpolation between two succes- 
sive levels. Thus, now represents the linear interpolation between the non-nested 
spaces Mj-i and Mj, and /j” 1 is its adjoint with respect to the inner products of 
Mj and Mj- 1 - 

From a practical standpoint, the algorithms are the same regardless of whether or 
not the triangulations are nested; algebraic operators Ij_ 1 and /j -1 are needed to 
transform the internal representation of coarse grid functions expressed in terms of 
basis functions of Mj- 1 into their internal representations as fine level functions (in 
a different basis even in the case of nested spaces). Therefore, in the implementation, 
little difference exists between the nested and non-nested cases. However, the addi- 
tional complexity of the non-nested approach appears in the fact that the transfer 
operators between the different levels are difficult to compute. Regardless of the order 
of the prolongation, one must determine in what triangles the fine node are located. 
Thus, this approach requires the use of efficient search algorithms. 

We also remark that there are now different choices for the definition of the coarse 
grid operator. The most natural one is to define it by a re-discretisation of the contin- 
uous problem on the coarse grid. This choice preserves the bandwidth of the original 
operators; however the alternate “Galerkin” definition 

A’- 1 = I^IU 

can be more efficient (with this definition, the error after an exact correction step is 
purely in Aj 1 Ker (/j -1 ) ). Of course, the “Galerkin” definition does not preserve the 
bandwidth of the original operator. In CFD, the non-nested multigrid method appears 
to be one of the most successful strategies (see, for instance, [2]). However, the need 
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to generate multiple meshes of the same geometries (that have in addition to respect 
some ratio between the discretisation parameters of the different levels) results in an 
excessive burden on the user; a method that relies on the use of many independently 
generated meshes is simply not practical (especially in 3-D). Consequently, algorithms 
that consider the generation of the coarse level spaces as part as the solution procedure 
have to be developed to make these techniques practical. The following method [3] is 
designed for this task. A recent work by Chan and Smith [4] uses a similar idea. 

2.3 Node-nested algorithm 

For structured grids, coarse levels are generated by removing one point over two in 
each coordinate direction and reconnecting the set of remaining nodes. This set of 
points thus forms a maximal independent subset of the vertices of the fine grid. We 
recall that a subset S of the vertices of a graph is said to be independent if no two 
vertices of S are connected by an edge of the graph. An independent subset S is called 
maximal if adding any additional vertices makes it dependent. It is easily realized 
that in order to extract a coarse mesh with mesh parameter ~ 2 h from a given 
non-structured one with mesh parameter h, we precisely need to define a maximal 
independent subset from the vertices of the triangulation. Considering a vertex P 
in a maximal independent subset S, we see that its nearest neighbours in S are at 
a distance ~ 2 in the graph. Hence, their physical distance from P will be of order 
2 h. In practice, it is not always desirable to form a maximal independent set. The 
major reason is that defining a maximal independent set from the boundary nodes 
can destroy the actual geometry. Typically in a maximal independent subset, half the 
boundary nodes are removed. However, some of the boundary nodes are crucial for 
the description of the geometry and have to be kept at every stage of the coarsening 
process. 

Therefore, the algorithm must first identify these nodes. Considering that the 
nodes that actually define the geometry are the nodes where the curvature is non- 
zero, this is done by computing the curvature of the boundary nodes and enforcing 
that the nodes having curvature above a given threshold are kept on all level during 
the coarsening process. Once this identification process is performed, the following 
algorithm can be used: 

First, place all the nodes of the current triangulation in a list. Sort this list in such a 
way that the special nodes defining the geometry are listed first, the boundary nodes 
are listed next, and the interior nodes are listed last. Then apply : 

for every node in the list do 
if node is selected 
then remove all its neighbours 

end do 
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This algorithm reduces the number of nodes roughly by a factor of 4 and produces 
a set of unconnected nodes. A coarse mesh can be obtained from these points by 
triangulating them. It is in principle possible to use any mesh generation strategy to 
perform this task; here, a Delaunay- Voronoi method is used. 

Figure 2 presents an example of the application of this technique to a triangular mesh 
around a spark plug. 




Figure 2. a: Initial level. 


Figure 2.b: After one coarsening. 



Figure 2.c: Second coarsening. Figure 2.d: Coarser mesh. 

Figure 2: Node-nested meshes around a spark plug. 

The main advantage of this variation of the non-nested algorithm is that the gen- 
eration of the coarse levels is part of the solution algorithm; the only input provided 


352 



by the user is the fine mesh. In addition, the restriction and prolongation operator 
can be computed very efficiently, any node of the fine level being itself a coarse node 
or having at least one neighbor that is a coarse node. For application of this method 
in CFD see [5] for Euler computations and [5], [6] for Navier-Stokes ones. 

2.4 Volume Agglomeration Multigrid method 

The previous methods appeal at one stage or another to complex geometrical informa- 
tions. The Volume Agglomeration MG of Lallemand and Dervieux [1] is an attempt 
to use only the minimal information given by the connectivity relation of the mesh. 
This method was originally introduced for first order hyperbolic problems as 

^, + V.F(,) = 0 (2) 

and for finite volume discretisation. Consider, for instance (see Figure 4. a below), 
the dual control volume mesh of the triangulation of Figure 2. a. To form the coarse 
grid control volume meshes, neighborings cells are agglomerated together to form a 
larger coarse cell. This strategy is the exact counterpart of the cell-centered multi- 
grid strategy devised, by Wesseling for cell-centered structured algorithms [7]. Figure 
3 illustrates this strategy for structured meshes. Figure 4 illustrates it for the 



Figure 3: Cell centered multigrid methods. 


non-structured mesh of Figure 2. The coarse grids are thus composed of a tiling of 
the space made of arbitrary polygons. Devising a discretisation on these general- 
ized meshes is not difficult for first-order equations. Application of a finite volume 
approach results in the same discrete formula as on a regular dual cell mesh: 


^ f Q+ 2 F{q).ndl = 0 (3) 

and the same numerical flux that is used on the fine mesh can be used to evaluate the 
integrals. To set up a multigrid strategy, we also need to define the restriction and 
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prolongation operators. According to the finite volume philosophy where the value 
associated with a cell can be interpreted as the mean value of the function on this 
cell, this is done by (we use here the simpler notation j — 1 — » H and j — *• h when 
only two levels are involved) 



Figure 4. a: Initial level. 


Figure 4.b: After one coarsening. 




Figure 4.c: Second coarsening. Figure 4.d: Coarser generalized mesh. 

Figure 4: Volume agglomerated meshes around a spark plug. 
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Prolongation 


Ih u h\j = u h\iu) (4) 

where l(j) is the coarse cell containing the fine cell whose index is j. 

Restriction 1 

(lj?U h )j = Y, ( U h)i ( 5 ) 

ieHj 

where 7 ij is the set of fine cells that constitute the coarse cell Cf . 

All the ingredients necessary to set up a multigrid strategy are thus at hand. Ap- 
plication of this technique to the 2-D Euler equations has been done in [1] and in 3-D 

in [8]. Additional computations have also been done in [9]. All these experiments 

reveal that the volume agglomeration method is extremely efficient for hyperbolic 
problems. The generation of the coarse grid is purely automatic and does not require 
complex mesh manipulations. The computational efficiency of the method is compa- 
rable to those of the non-nested approach and of regular structured MG techniques; it 
is seen that this method largely supersedes the non-nested multigrid methods. How- 
ever, when applied to an elliptic problem, this technique experiences difficulties. A 
good way to understand these difficulties is to interpret this method as an algebraic 
one. 


3 Algebraic methods 


In the recent past, several attempts have been reported to use only algebraic in- 
formations from the discrete problems to be solved. These methods are known as 
aggregation/disaggregation methods [10] or algebraic multigrid methods [11]. Sup- 
pose that we want to solve a linear system on IR”: 


A x = /. 


where A is a symmetric, positive definite matrix. Let {e;}p = be the canonical 

basis of M n , define {Hj}{j=i p> to be a partition of {1, . . . ,n} into P disjoint sets, 

and define two vectors t, z € M n . Two linear operators between lR n and IR P can be 
constructed by: 

Iff : !R n — *■ 1R P : restriction or aggregation operator: 

!h : {( u h)j}{j= -* {{Vii)j}{j=i,...,p} = { Y z k(Uh)k}{j=i,...,py ( 6 ) 

ken, 


1 It would have been more consistent to define the restriction as (7^ Uh)j 




YlizHj uiesfCi) 

We use here this definition because the interpretation of the volume agglomeration method as an 
algebraic one yields a simpler expression. 
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(7) 


lx : 1R P —* M n : prolongation or dissaggregation operator: 

4 : {(EWihj-w) - V h = 't(U„) t H k t 

k= 1 

where Hk is the euclidian orthogonal projection operator onto With the 

help of these two operators, given a linear operator Lh G C(M n , lR n ), a coarse grid 
correction operator belonging to C(]R P , M p ) can be defined by 

L„ = (8) 

It is easy to check that the coefficients of the coarse grid matrix are given by 

=« z, HiL h Hjt » (9) 

where << >> is the inner product in IR n . This type of method provides a 

very general setting to construct multi-level techniques. They are known as aggrega- 
tion/dissagregation methods and have been introduced for problems in economics or 
social sciences where they appear in a very natural manner. AMG methods repre- 
sent an improved variation of these methods, where the way to define the partitions 
and the transfer operators are deduced from an analysis of the matrix 

A itself. 

If we now interpret the Volume Agglomeration MG in an algebraic setting and 
construct the coarse grid operator in the “variational” way, it is easy to see that this 
technique is equivalent to an equation summing technique. Let Lh be the matrix 
resulting from the fine grid discretisation, and suppose that Lh is reordered in such a 
way that 


£1,1. 

. . . , Li,p ^ 

*> 

\ l p,i> 

* > 

■ • • > Lffp ) 


where L^j is a Card(Hi ) x Card(Hj) block matrix whose coefficients are l p , q for 
p € Hi and q 6 Hj. The Volume Agglomeration method results in the choice t = z = 
(1,1,..., 1, 1)*, and from (8) we see that the coefficients of the coarse grid matrix Lff 
are defined by 

{Lff\j = 

peHi,qeHj 

which exactly corresponds to summing all the entries belonging to the same block. 
Moreover, if the fine grid matrix results from a nearest neighbor stencil, it will also 
be the case for the coarse grid one. It is also well known that the “variational” way 
to construct coarse grid operators implies the preservation of the M-matrix property 
of the fine grid one. 

The analysis of this type of algebraic technique is extremely difficult because no 
reference is made to the differential equation to be solved or to the mesh on which the 
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solution is sought. We note, however, that if the original problem has a differential 
background it is easy to recover functional information from any definition of algebraic 
transfer operators. For simplicity we consider that the restriction operator is defined 
as the adjoint of the prolongation operator and that the inner product on IR P is 
inherited from the one in IR" in the following sense: 

< Uh,V h >h=< >h 

This will allow the algebraic theory to fit into the variational framework. We now 
simply invert the direction of the diagram displayed in Figure 2 to realize that any 
algebraic definition of a prolongation operator is equivalent to an implicit definition 
of a coarse grid space M E by setting 


V H = T h l\ : IR P -» M h 


and 

M h = U{T h ) 


Thus one has naturally M E C Mh with continuous injection, and we recover the 


r h%(R p ) 

Th T 
IR P 


I h H 


M h 
T r fc 

IT 


Figure 5: Diagram defining the coarse spaces. 


framework of the nested spaces variational theory. 

Remark 2: Here, we note that we have first defined the transfer operator and then 
have deduced the definition of the functional spaces associated with them. This is 
exactly the converse of remark 1. 

Moreover it is easy to get an explicit form of a basis of M E by: 

Proposition : Let {ej} i= i...p be the canonical basis of IR P , then the family {Fp-(ej)} i= i...p 
is a basis of M E . 

The previous remark can help in the design or analysis of multigrid algorithms. 
Using it to analyse the volume agglomeration method applied to elliptic equations, we 
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consider the fine level discretisation space as defined by the usual nodal basis function 
(f>k and we get: 

Proposition: The Volume Agglomeration MG method is equivalent to a nested vari- 
ational method where the coarse grid space is generated by the basis functions: 

#=£& Vj = i,...,p (10) 

k&Hj 

We see that the coarse grid space Mh is a very poor one; it does not even contain 
linear functions. This space is, thus, not dense in H 1 , and this implies that the solution 
of the coarse grid problem is a very poor approximation of the fine grid solution. For 
instance, consider the 1-D case and define Lj t as the usual three point finite difference 
approximation of the Laplacian, i.e. in stencil notation Lu = ^-[—1,2, —1]. With the 
notations of Figure 6, the prolongation operator l\ is defined by 

Vb(Ub))x = (CMi and (/|(C%))s+i = (U E )i (11) 

with (9), the coarse grid problem writes 

1 r i o ilrr _ ^ i.^k)2i “I” ( Rh)2i+1 

jp[-l,2,-lF»=2 j { ’ 


■I 1 1 1 — I — A i- 

2t 21+1 21+2 21+3 


1 


1+1 


H 


Figure 6: VA-MG in 1-D. 


As already noted in [12], a factor 2 is missing in the right-hand side, and the coarse 
grid operator is not a consistent approximation of the Laplacian. These problems are 
also well known in structured cell centered multigrid methods (see [7]). In the recent 
past, several attempts have been proposed to overcome this problem. In [12], the 
analysis of the 1-D example given previously was extended to 2-D structured meshes, 
and it was shown that a simple scaling allows a consistent approximation of the 
Laplace operator to be recovered. The coarse grid problems were then scaled by a 
factor of 2 fc where k is the level number. This strategy gave good results. It has been 
recently used in [13] for 2-D steady viscous flows with k — e turbulence modelling and 
found to be rather efficient from a practical point of view. An alternate approach has 
also been proposed in [9], where the prolongation operator is choosen by an Algebraic 
Multigrid heuristic and the coarse grid operator is defined by the variational method. 
However in practice, this strategy was too costly, and the numerical results reported 
in [9] use the same scaling strategy as in [12]. 
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4 A prolongation operator for Volume Agglomer- 
ation MG 

For structured cell centered multigrid methods a simple remedy for the above problem 
consists in defining prolongation and restriction operators that have higher orders of 
accuracy. For instance in the 1-D case, the definition (11) is replaced by 


(■ I h H (U H )hi = l(U H )i + i(U E )i . ! 

Ull(UH))2i+ 1 = I {Uh)% + \{UH)i+l 


(13) 


This exactly corresponds to a linear interpolation between the barycenters of the 
coarse cells. 

For finite element nonstructured meshes, a similar approach can be considered. 
The set of the coarse cells Cf(i = l,...,n) constitute a tiling of the domain Q. 
If we associate with each coarse cell Cf a unique point i € Cf (for instance, the 
gravity center of the cell), we can triangulate this set of points by any convenient 
mesh generator algorithm. Although this approach will certainly be efficient, we see 
that it has few advantages against the node-nested method. With the objective of 
keeping the amount of geometrical information as small as possible, we try here a 
simplified variation of this approach that seems to give good results. 

Thus, let Jg be a prolongation operator having the necessary degree of accuracy. 
For instance take J # as being the operator defined by a triangulation of the gravity 
centers of the coarse cells. Then, there exists an n x n operator ah such that <7# = 
ahlff, where is the straight injection (4). In a finite volume framework, Ih is 
the operator that takes a constant by cell function on the coarse grid and returns the 
same constant by cell function on the fine grid. On the other hand, J # takes the same 
constant by cell function on the coarse grid but returns a piecewise linear function. 
Thus, can be interpreted as a reconstruction operator; it transforms a constant by 
cell function in a piecewise linear function. As an example, let us consider the 1-D 
case: J # is defined by expression (13), while is given by (11). (See figure 6.) On 
the interval defined by the gravity center of the coarse cells i and i + 1 the operator 
ah transforms the piecewise constant function whose values are U l H on cell i and U^ 1 
on cell i + 1 into the linear function 

n(x) = U‘ H +(x- x i+1/2 W s +1 - U‘ H )/2h 

and ah represents the interpolation of this function on the fine grid. From (11) and 
(13), it is readily seen that the expression of ah is 

(a h u) 2i = ju 2i + \u 2i ^ 

(ahu)2i+l = 4«2j+l + \u 2 i +2 


or 


. . 1 1 1 

{ a h u) k = -u k + - Ufc+i + -u k - 1 
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i + 1/2 


I 


— a 1 — | — 0 j B 

i + 1 

Figure 7: 1-D reconstruction procedure. 


Two-dimensional structured cases also reveal that au can be interpreted as a re- 
construction operator and can be written as 

jUk(A;) 


where n(k) is the set of neighbors of k, and the a* are coefficients that depend on the 
geometry. 

We then propose to use the same strategy for non-structured meshes. In order to 
obtain a formula as easily as possible, we define the operator by 


ClhUk = 


EiU K (k) V0l(Ci)Uj 

EiU«*(fc) vol(Ci) 


(14) 


That is, we, replace the geometric coefficients a* by a very crude weighting. Although 
rather heuristic, this approach seems to give good results. We now report numerical 
results for solving the Laplace equation on a non-structured triangular mesh around 
a NACA airfoil with homogeneous Dirichlet boundary conditions. 

Figure 8. a shows the convergence curves on three different meshes (i.e. 800, 3000 
and 12000 nodes) obtained by a two grid method with full solution of the coarse 
grid problem and two Jacobi relaxations on the fine mesh. The straight injection 
(11) is used in this experiment. It is clear that the convergence factor becomes 
worse as the size of the mesh increases. On the other hand, Figure 8.b shows the 
results obtained for the same experiment using the improved prolongation (14). The 
convergence factor is much better, and it is clear that mesh-independent results are 
obtained. Finally, the same experiment is performed with improved prolongation (14) 
in a V-cycle setting using 7 different levels for the 12000-node triangulation, 6 for the 
3000-node triangulation and 5 for the 800-node triangulations (Figure 9). Again it is 
seen that mesh independent results are obtained and that there is no decrease in the 
performance with respect to the 2-grid case. 
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a: Without the improved prolongation. 
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b: With the improved prolongation. 


Figure 8: Two-grid cycle on three different meshes (800, 3000 and 12000 nodes) 
without and with the improved prolongation. 


Multilevel on 3 NACA meshes (1 relaxation on each level) 



Figure 9: V-cycle with the improved prolongation. 
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SUMMARY 

A preconditioning theory for Schwarz methods is presented. The theory establishes sufficient 
conditions for multiplicative and additive Schwarz algorithms to yield self-adjoint positive def- 
inite preconditioners. It allows for the analysis and use of non-variational and non-convergent 
linear methods as preconditioners for conjugate gradient methods, and it is applied to domain 
decomposition and multigrid. This paper illustrates why symmetrizing may be a bad idea for 
linear methods. Numerical examples are presented for a test problem. 

INTRODUCTION 

In this paper, we consider additive and multiplicative Schwarz methods and their acceler- 
ation with Krylov methods, for the numerical solution of self-adjoint positive definite (SPD) 
operator equations arising from the discretization of elliptic partial differential equations. The 
standard theory of conjugate gradient acceleration of linear methods requires that a certain 
operator associated with the linear method — the preconditioner — be symmetric and positive 
definite. Often, however, as in the case of Schwarz-based preconditioners, the preconditioner 
is known only implicitly, and symmetry and positive definiteness are not easily verified. Here, 
we try to construct natural sets of sufficient conditions that are easily verified and do not re- 
quire the explicit formulation of the preconditioner. More precisely, we derive conditions for 
the constituent components of MG and DD algorithms (smoother, subdomain solver, trans- 
fer operators, etc.), that guarantee symmetry and positive definiteness of the preconditioning 
operator which is (explicitly or implicitly) defined by the resulting Schwarz method. We exam- 
ine the implications of these conditions for various formulations of the standard DD and MG 
algorithms. 

The outline of the paper is as follows. We begin in the next section by reviewing basic linear 
methods for SPD linear operator equations and by examining Krylov acceleration strategies. 
A simple lemma will illustrate why symmetrizing may be a bad idea for linear methods. In 
the third and fourth sections, we analyze multiplicative and additive Schwarz preconditioners. 
We develop a theory that establishes sufficient conditions for the multiplicative and additive 
algorithms to yield SPD preconditioners. This theory is used to establish sufficient conditions for 
multiplicative and additive DD and MG methods, and it allows for analysis of non-variational 
and even non-convergent linear methods as preconditioners. In the final section, we report 
results of numerical experiments with finite-element-based DD and MG methods applied to a 
difficult test problem with discontinuous coefficients to illustrate the theory and conjectures. 

1 This work was supported in part by the NSF under Cooperative Agreement No. CCR-9120008. 
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LINEAR ITERATIVE METHODS 


Notation., Let % be a real finite-dimensional Hilbert space equipped with the inner-product 
(•, •) inducing the norm || • || = (•, -) 1 / 2 . The adjoint of a linear operator A G L(%, %) with respect 
to (•, •) is the unique operator A T satisfying (Au,v) — ( u,A T v ) , Mu, v G %. An operator A is 
called self-adjoint or symmetric if A = A r ; a self-adjoint operator A is called positive definite or 
simply positive if (Au, u) > 0 , Mu e K, u / 0. If A is self-adjoint positive definite (SPD), then 
the bilinear form (Au, v) defines another inner-product, which we denote as (•, It induces 
the norm || • \\ A = (•, -) 1 / 2 . 

The adjoint of an operator M with respect to (•, -) A , the A-adjoint, is the unique operator 
M* satisfying (Mu, v) A = (u, M*v) A , Mu, v G %. From this definition it follows that 

M* = A- l M T A . (1) 


M is called A-self-adjoint if M = M* and A-positive if (Mu, u) A > 0, Mu G %, u / 0. 

If N G L(%i,% 2 )) then N T G L('H 2 ,'^i) is defined as the unique operator relating the 
inner-products in %i and % 2 as follows: 


(Nu, v) U2 = (u, N T v) Hl , Mu G U x , Mv G ■ (2) 

Since it is usually clear from the arguments which inner-product is involved, we shall often drop 
the subscripts on inner-products (and norms) throughout the paper. 

We denote the spectrum of an operator M as a(M). The spectral theory for self-adjoint 
linear operators states that the eigenvalues of the self-adjoint operator M are real and lie in 
the closed interval [A m j n (M), A max (M)] defined by the Raleigh quotients: 

, . (Mu,u) . . (Mu,u) 

Amin (M) = mm — A max (M) = max — r-. 

u^o (u, U) (u, u) 

Similarly, if an operator M is A-self-adjoint, then its eigenvalues are real and lie in the interval 
defined by the Raleigh quotients generated by the A-inner-product. A well-known property is 
that if M is self-adjoint, then the spectral radius of M, denoted as p(M), satisfies p(M) = \\M\\. 
This property can also be shown to hold in the A-norm for A-self-adjoint operators. 

Lemma 1. If A is SPD and M is A-self-adjoint , then p(M ) = ||Af||^. 

Linear methods. Given the equation Au = f, where A G L(%, %) is SPD, consider the 
preconditioned equation BAu = Bf, with B G L("H, B). The operator B, the preconditioner, 
is usually chosen so that the linear iteration 

u n+1 =u n - BAu 11 + Bf = (I - BA)u n + Bf, (3) 

has some desired convergence properties. The convergence of (3) is determined by the properties 
of the so-called error propagation operator, E — I — BA. 

We now state a series of simple lemmas that we shall use repeatedly in the following sections. 
Their short proofs and further references can be found in [5]. 


364 



Lemma 2. If A is SPD, then BA is A-self-adjoint if and only if B is self-adjoint. 

Lemma 3. If A is SPD, then E is A-self-adjoint if and only if B is self-adjoint. 

Lemma 4. If A and B are SPD, then BA is A-SPD. 

■ Lemma 5 . If A is SPD and B is self-adjoint, then || £7|| ^ = p(E). 

Lemma 6. If E* is the A-adjoint of E, then \\E\\\ = ||£'£'*|| j 4 . 

Lemma 7 . If A and B are SPD and E is A-non-negative, then HL'Ha < 1- 

Lemma 8. If A is SPD and B is self-adjoint, and E is such that 

-Ci{u,u) A < ( Eu,u) a < C 2 (u,u) A , Vu e H, 

for Ci > 0 and C 2 > 0, then p{E) = ||£7||^ < max{Ci, C 2 }. 

Lemma 9. If A and B-are SPD, then Lemma 8 holds for some C 2 < 1. 

The following lemma illustrates why symmetrizing is a bad idea for linear methods. It 
exposes the convergence rate penalty incurred by symmetrization of a linear method. 

Lemma 10. For any E E L it holds that: 

p(EE) < \\EE\\ a < \\E\\ 2 a = \\EE*\\ a = p(EE*). 

Proof. The first and second inequalities hold for any norm. The first equality follows from 
Lemma 6, and the second follows from Lemma 1. □ 


Note that this is an inequality not only for the spectral radii but also for the 4-nor ms of 
the nonsymmetric and symmetrized error propagators. The lemma illustrates that one may 
actually see the differing convergence rates early in the iteration as well. 

Krylov acceleration of SPD linear methods. The conjugate gradient method was developed 
by Hestenes and Stiefel [4] as a method for solving linear systems Au = f, with SPD operators 
A. In order to improve convergence, it is common to precondition the linear system by an SPD 
preconditioning operator B ~ A~ x , in which case the generalized or preconditioned conjugate 
gradient method results. Our goal in this section is to briefly review some relationships between 
the contraction number of a basic linear preconditioner and that of the resulting preconditioned 
conjugate gradient algorithm. 

We start with the well-known conjugate gradient contraction bound [3] 


e i+l A < 2 1 


1 + V ka{BA) 


|e°|U = 2 ||e°|U, 


where k a (BA), the A-condition number of BA, is the ratio of extreme eigenvalues of BA. 

The following result gives a bound on the condition number of the operator BA in terms 
of the extreme eigenvalues of the error propagator E = I - BA ; such bounds are often used in 
the analysis of linear preconditioners (cf. Proposition 5.1 in [9]). 
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Lemma 11. If A and B are SPD and E is such that 

—Ci(u,u)a < ( Eu,u)a < C2(u,u)a, Vit G Ti, 
for Ci > 0 and C 2 > 0, then the above must hold with C 2 < 1, and it follows that 

*a(BA) < 

Remark 1. Even if a linear method is not convergent, it may still be a good preconditioner. 
If C 2 « 1 and if C\ > 1 does not become too large, then ka(BA) will be small and the 
conjugate gradient method will converge rapidly, even though the linear method diverges. 

The next result connects the contraction number of the preconditioner to the contraction 
number of the preconditioned conjugate gradient method (see [10] for a proof). 

Lemma 12. If A and B are SPD and \\I — BA||a < 5 < 1, then S cg < 5. 

Krylov acceleration of nonsymmetric linear methods. The convergence theory of the conju- 
gate gradient iteration requires that the preconditioned operator BA be A-self-adjoint (see [1] 
for more general conditions), which from Lemma 2 requires that B be self-adjoint. If a Schwarz 
method is employed which produces a nonsymmetric operator B, then although A is SPD, the 
theory of the previous section does not apply and a nonsymmetric solver such as conjugate 
gradients on the normal equations [1], GMRES [6], CGS [7], or Bi-CGstab [8] must be used. 
Further on, we shall use the preconditioned Bi-CGstab algorithm to accelerate nonsymmetric 
Schwarz methods. In a sequence of numerical experiments, we shall compare the effectiveness 
of this approach with unaccelerated symmetric and nonsymmetric Schwarz methods, and with 
symmetric Schwarz methods accelerated with conjugate gradients. 

MULTIPLICATIVE SCHWARZ METHODS 

Consider a product operator of the form: 

E — I — BA = (7 — BiA){I - B 0 A)(I - B X A), (4) 

where Bi,B 0 , and Bi are linear operators on H, and where A is, as before, an SPD operator 
on H. We are interested in conditions for Bi,B 0 , and Bi, which guarantee that the implicitly 
defined operator B is self-adjoint and positive definite and, hence, can be accelerated by using 
the conjugate gradient method. 

Lemma 13. Sufficient conditions for symmetry and positivity of operator B, defined by (4), 
are: 

1. Bi = BJ; 

2. B 0 = Bl; 

3. ||7 — BiA\\ a < 1; 
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4 ■ Bo non-negative on H. 


Proof. By Lemma 3, in order to prove symmetry of B, it is sufficient to prove that E is A-self- 
adjoint. By using (1), we get 

E* = A- 1 E t A = (7 - BlA)(I - BjA){I - S[A), 

which equals E following from conditions 1 and 2. 

Next, we prove that ( Bu , u) > 0, Vu € H, u ^ 0. Since A is non-singular, this is equivalent 
to proving that ( BAu,Au ) > 0. Using condition 1, we have that 


(BAu, Au) 


= (( I-E)u,Au ) 

= (u, Au) - ((7 - BjA)(I - B 0 A){I - B x A)u,Au) 

= (u, Au) - ((/ - B 0 A){I - B x A)u, A(I - B x A)u) 

= ( u , Au) — ((/ — B x A)u, A{I — BiA)u) + ( B 0 w , w),- 


where w = A(I — B x A)u. By condition 4, we have that (B 0 w, w) > 0. Condition 3 implies that 
({I ~ B\A)u,A{I — B x A)u) < ( u,Au ) for a / 0. Thus, the first two terms in the sum above 
are together positive, while the third is non-negative, so that B is positive. □ 


Multiplicative domain decomposition. Given the finite-dimensional Hilbert space H, consider 
J spaces Hk , k = 1, . . . , J, together with linear operators Ik € L {H k , H), null(J fc ) = {0}, such 
that IkHk G % — J2k = i hB-k- We also assume the existence of another space Ho, an associated 
operator 7 0 such that I 0 % 0 C H, and some linear operators I k 6 L(H, Hk), k = 0, . . . , J. For 
notational convenience, we shall denote the inner-products on Hk by (•, •) (without explicit 
reference to the particular space). Note that the inner products on different spaces need not 
be related. 

In a domain decomposition context, the spaces H k , k = 1, . . . , J are typically associated 
with local subdomains of the original domain on which the partial differential equation is 
defined. The space Ho is then a space associated with some global coarse mesh. The operators 
h,k = 1 ,..., J are usually inclusion operators, while 7 0 is an interpolation or prolongation 
operator (as in a two-level MG method). The operators I k ,k = l,...,J are usually orthogonal 
projection operators, while 7° is a restriction operator (again, as in a two-level MG method). 

The error propagator of a multiplicative DD method on the space H employing the subspaces 
hHk has the general form [2] 

E = I — BA = (I — IjRjI j A) • • • (7 - I 0 RoI°A) • • • (7 - IjRjI j A) , (5) 

where R k and R k , k = 1, . . . , J, are linear operators on Hk and Rq is a linear operator on H 0 . 
Usually the operators Rk and Rk are constructed so that Rk ~ Af 1 and Rk ~ Af 1 , where A k 
is the operator defining the subdomain problem in H k - Similarly, R 0 is constructed so that 
Ro ~ A 0 Actually, quite often Ro is a “direct solve”, i.e. , Ro = Aq 1 . The subdomain problem 
operator A k is related to the restriction of A to H k . We say that A k satisfies the Galerkin 
conditions or, in a finite element setting, that it is variationally defined when 

= I k AI k , I k = II (6) 
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Recall that the superscript “T” is to be interpreted as the adjoint in the sense of (2), i.e., with 
respect to the inner-products in % and 7i k . 

Propagator (5) can be thought of as the product operator (4) by choosing 

I - B l A = n (/ - I k R k I k A) , Bo = I 0 RoI° , I ~ BiA = ]Q(/ - I k R k I k A) , 

k =J k=l 

where B x and B x are known only implicitly. This identification allows for the use of Lemma 13 
to establish sufficient conditions on the subdomain operators R k , R k , and Rq to guarantee that 
multiplicative domain decomposition yields an SPD operator B. 

Theorem 1. Sufficient conditions for symmetry and positivity of the multiplicative domain 
decomposition operator B, defined by (5), are: 

1. I k = c k ll , c k > 0 , k = 0, • • • , J ; 

2. R k = R% , k = 1, • • • , J ; 

3. Ro = Rq ; 

5. Rq non-negative on Rq . 

Proof. We show that the conditions of Lemma 13 are satisfied. First, we prove that B x = Bf , 
which, by Lemma 3, is equivalent to proving that (I - B X A )* = (/ — B X A). By using (1), we 
have 

( n o - hRki k A)) = a- 1 ( n a - i t R k i k A)) a = n (/ - (rfRi (hfA) , 

\ k -l J \fc=l / k=J 

which equals ( I — B X A) under conditions 1 and 2 of the theorem. The symmetry of B 0 follows 
immediately from conditions 1 and 3; indeed, 

Bl = (I q RoI°) T = ( I°) T Rl(lo) T = (eoloWco 1 ! 0 ) = I 0 RoI° = B 0 . 

By condition 4 of the theorem, condition 3 of Lemma 13 holds trivially. The theorem follows 
if one realizes that condition 4 of Lemma 13 is also satisfied, since, 

(B 0 u, u) = ( I 0 R 0 I°u , u) = (R 0 /V tfu) = c^iRo^u, I°u ) > 0 , 'iueU. 

□ 

Remark 2. Note that one sweep through the subdomains, followed by a coarse problem 
solve, followed by another sweep through the subdomains in reverse order, gives rise to an 
error propagator of the form (5). Also, note that no conditions are imposed on the nature of 
the operators A k associated with each subdomain. In particular, the theorem does not require 
that the variational conditions be satisfied. The theorem also does not require that the overall 
multiplicative DD method be convergent. 
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Remark 3. The results of the theorem apply for operators on general finite-dimensional 
Hilbert spaces with arbitrary inner-products. They hold in particular for matrix operators on 
R^, equipped with the Euclidean inner-product or the discrete L 2 inner-product. In the former 
case the superscript “T” corresponds to the standard matrix transpose. In the latter case, the 
matrix representation of the' adjoint is a scalar multiple of the matrix transpose; the scalar 
may be different from unity when the adjoint involves two different spaces, as in the case of 
prolongation and restriction. This possible constant in the case of the discrete L 2 inner-product 
is absorbed in the factor c* in condition 1. This allows for an easy verification of the conditions 
of the theorem in an actual implementation, where the operators are represented as matrices 
and where the inner-products do not explicitly appear in the algorithm. 

Remark 4 ■ Condition 1 of the theorem (with Ck = 1) for k = 1, . . . , J is usually satisfied 
trivially for domain decomposition methods. For k = 0, it may have to be imposed explicitly. 
Condition 2 of the theorem allows for several alternatives which give rise to an SPD precon- 
ditioner, namely: (1) use of exact subdomain solvers (if Ak is a symmetric operator); (2) use 
of identical symmetric subdomain solvers in the forward and backward sweeps; and (3) use of 
the adjoint of the subdomain solver on the second sweep. Condition 3 is satisfied when the 
coarse problem is symmetric and the solve is an exact one, which is usually the case.Tf not, the 
coarse problem solve has to be symmetric. Condition 4 in Theorem 1 is clearly a non-trivial 
one; it is essentially the assumption that the multiplicative DD method without a coarse space 
is convergent. Condition 5 is satisfied, for example, when the coarse problem is SPD and the 
solve is exact. 

Multiplicative multigrid. Consider the Hilbert space % and J spaces Rk together with 
operators I k G L (R k ,R), null(/ fc ) = 0, such that the spaces I k R k are nested and satisfy 
I{Hi C I 2 V .2 C • • • C C Rj = R. As before, we denote the T^-inner-products 

by (•,•), since it will be clear from the arguments which inner-product is intended. Again, the 
inner-products are not necessarily related in any way. We assume also the existence of operators 
I k eL(H,Hk). 

In a multigrid context, the spaces Rk are typically associated with a nested hierarchy of 
successively refined meshes, with R\ being the coarsest mesh and Rj being the fine mesh 
on which the PDE solution is desired. The linear operators Ik are prolongation operators, 
constructed from given interpolation or prolongation operators that operate between subspaces, 
i.e., I k _ x G L(Rk-i,Rk)- The operator I k is then constructed (only as a theoretical tool) as a 
composite operator 

The composite restriction operators I k , k = 1, . . . , J — 1, are constructed similarly from some 
given restriction operators I k ~ l G L("Hfc, Rk~i)- The coarse problem operators Ak are related 
to the restriction of A to R k . As in the case of DD methods, we say that Ak is variationally 
defined or satisfies the Galerkin conditions when conditions (6) hold. It is not difficult to see 
that conditions (6) are equivalent to the following recursively defined variational conditions: 

4= = Jf+i-4t+i4 +1 > Ik+i = (4‘ +1 ) T (8) 

when the composite operators R appearing in (6) are defined as in (7). 
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In a finite element setting, conditions (8) can be shown to hold in ideal situations, for 
both the stiffness matrices and the abstract weak form operators, for a nested sequence of 
successively refined finite element meshes. In the finite difference or finite volume method 
setting, conditions (8) must often be imposed algebraically, in a recursive fashion. 

The error propagator of a multiplicative V-cycle MG method is defined implicitly as 

E = I — BA = I — DjAj, (9) 

where Aj = A and where operators D k , k — 2, . . . , J, are defined recursively: 

I - D k A k = (I - R k A k )(I - ~ RkA k ), k = 2, . . . , J, (10) 

Di = Ri . (11) 

Operators R k and R k are linear operators on R k , usually called smoothers. The linear operators 
A k G L('H k ,'H k ) define the coarse problems. They often satisfy the variational condition (8). 
The error propagator (9) can be thought of as an operator of the form (4) with 

B\ = Rj , Bo = > Bi = Rj . 

Such an identification with the product method allows for the use of Lemma 13. The following 
theorem establishes sufficient conditions for the subspace operators R k , R k , and A k in order 
to generate an (implicitly defined) SPD operator B that can be accelerated with conjugate 
gradients. 

Theorem 2. Sufficient conditions for symmetry and positivity of the multiplicative multi- 
grid operator B, implicitly defined by (9), (10), and (11), are 

1. A k is SPD on 1-L k , k = 2,...,J; 

2. It 1 = c*(/t 1 ) r , c k > 0, k = 2, . . . , J; 

3. R k = Rl k — 2, . . . , J ; 
l Ri = Rj; 

5. \\I-RjA\\ a <1; 

6. ||I-iM*|| Afc <l, k = 2, . . . , J — 1; 

7. R\ non-negative on 7i\ . 

Proof. Since Rj = Rfif, we have that B\ = Bf, which gives condition 1 of Lemma 13. Now, B 0 
is symmetric if and only if 

B 0 = lUDj-Jt 1 = {ctit'fDUicjiUY = Bl 

which holds under condition 2 and a symmetry requirement for D j_i. We will prove that 
Dj-i = Dti by induction. First, D x = DT since R x = Rf . By Lemma 3 and condition 1, D k 
is symmetric if and only if E k = I — D k A k is A fc -self-adjoint. By using (1), we have that 

El = A- k l {(I-R k A k )(I-lt,D k ^A k )(I-R k A k )] T A t 
= (I- R k A k )(I - Mt - R k A k ), 
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where we have used conditions 1, 2, and 3. Therefore, El = Ek, if Dk-i = Dl__ v Hence, the 
result follows by induction on k. 

Condition 3 of Lemma 13 follows trivially by condition 5 of the theorem. 

It remains to verify condition 4 of Lemma 13, namely that B 0 is non-negative. This is 
equivalent to showing that D j_i is non-negative on Hj~\. This will follow again from an 
induction argument. First, note that D x = R x is non-negative on H\. Next, we prove that 
(DkVk,Vk) > 0, £ Hk, or, equivalently, since A k is non-singular, that (DkAkVk, A^k) > 0. 

So, for all v k £Hk, 


(DkA^k, A^k) — 


(A k Vk,v k ) - (. A k EkVk,v k ) 

( A k Vk , Vk) — (Ak(I — Ik-\Dk-\ll~ l Ak)(I — RkA k )vk , ( I — RkAk)vk) 
(AkVk, Vk) (Ak(I RkAk)Vk, (I RkAk)Vk) 

+ (Akl^Dk-iI^Akil - R k Ak)Vk, (/ - R k A k )v k ) 

(‘ v k ,v k ) Ak - (S k Vk,SkVk)A k +c^ 1 (D k -iVk-i,Vk-i) 


where Sk — I — RkAk and Vk-i = / * l Ak(I — RkA k )v k £ Hk- 1 - By condition 6, the first two 
terms add up to a non-negative value. Hence, Dk is non-negative if Dk-i is non-negative. □ 


Remark 5. As noted earlier in Remark 3, the conditions and conclusions of the theorem 
can be interpreted completely in terms of the usual matrix representations of the multigrid 
operators. 

Remark 6. Condition 1 of the theorem requires all but the coarsest grid operator to be SPD. 
This is easily satisfied when they are constructed either by discretization or by explicitly enforc- 
ing the Galerkin condition. Condition 2 requires restriction and prolongation to be adjoints, 
possibly multiplied by an arbitrary constant. Condition 3 of the theorem is satisfied when the 
number of pre-smoothing steps equals the number of post-smoothing steps and, in addition, 
one of the following is imposed: (1) use of the same symmetric smoother for both pre- and 
post-smoothing; or (2) use of the adjoint of the pre-smoothing operator as the post-smoother. 
Condition 4 requires a symmetric coarsest mesh solver. When the coarsest mesh problem is 
SPD, the symmetry of Ri is satisfied when it corresponds to an exact solve (as is typical for MG 
methods). Condition 5 is a convergence requirement on the fine space smoother. Condition 6 
requires the coarse grid smoothers to be non-divergent. The non-negativity requirement for 
Ri is a non-trivial one; however, if A\ is SPD, it is immediately satisfied when the operator 
corresponds to an exact solve. 


ADDITIVE SCHWARZ METHODS 
Consider a sum operator of the following form: 

E = I - BA = I - u(B 0 + Bi)A, ui > 0, (12) 

where, as before, A is an SPD operator and B 0 and Bi are linear operators on H. 

Lemma 14. Sufficient conditions for symmetry and positivity of B, defined in (12), are 
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1. B\ is SPD in PL; 

2. B 0 is symmetric and non-negative on PL. 

Proof. We have that B = uj(B q + B\), which is symmetric by the symmetry of B 0 and B\. 
Positivity follows since ( B 0 u,u ) > 0 and (B\u,u)> 0, \fu G PL, u ^ 0. □ 


Additive domain decomposition. We consider the space % and the J subspaces h'Hk such 
that IkHk Q'H = T,i= i h'Hk- Again, we allow for a “coarse” subspace IqPLq C PL. 

The error propagator of an additive DD method on the space PL employing the subspaces 
h'Hk has the general form (see [10]) 


E = I — BA = I — u(I 0 R 0 I 0 + hRJ 1 + • • • + IjhjI J )A. (13) 


The operators Rk are constructed in such a way that Rk ~ Af 1 , where the Ak are the subdomain 
problem operators. Propagator (13) can be thought of as the sum method (12) by taking 
Bq — I 0 R 0 I° and B\ = Yfk=\ hRkI k . This identification allows for the use of Lemma 14 
in order to establish conditions to guarantee that additive domain decomposition yields an 
SPD preconditioner. Before we state the main theorem, we need the following lemma, which 
characterizes the splitting of PL into subspaces h'Hk in terms of a positive splitting constant S 0 . 


Lemma 15. Given any 
constant Sq > 0 such that 


v £ PL, there exists a splitting v = Y)k= i h^k, v k £ Rk, and a 


£ WhvA < SoiMli 


k—1 


(14) 


Proof. Since Yfi= i h'Hk = PL, we can construct subspaces C PL k such that IkVk n//V; = {0}, 
for k ^ l and PL = Y.i=ihVk- Any v G PL, can be decomposed uniquely as v = Yfk=\hvk, 
Vk £ V fc . Define the projectors Qk € L(PL,hVk ) such that QkV = I^k- Then, 

Z \\hvk\\\ = E WQwWa < Y, \\Qk\W lbll 2 v 

k = 1 fc=l k = 1 

Hence, the result follows with Sq = J2k=i HQ/cIIa- D 


Theorem 3. Sufficient conditions for symmetry and positivity of the additive domain de- 
composition operator B, defined in (13), are 

l.I k = c k Ik, c k > 0 , k = 0, . . . , J; 

2. Rk is SPD on PLk, k = 1, . . . , J; 

3. Rq is symmetric and non-negative on PLq. 

Proof. Symmetry of B 0 and Bi follow trivially from the symmetry of Rk and Rq and from 
I k = Cfc/J. That Bq is non-negative on PL follows immediately from the non-negativity of Rq 
on PLq. 
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Finally, we prove positivity of B x . Define A k = I k AI k , k = 1, . . . , J. By condition 1 and the 
full rank nature of Ik, we have that A k is SPD. Now, since R k is also SPD, the product R k A k 
is Afc-SPD. Hence, there exists an w 0 > 0 such that 0 < ui 0 < A i{R k A k ), k = 1, . . . , J. This is 
used together with (14) to bound the sum 


j 


j 


j 


Y c k l {Rk V, Vk) = Y c kH A kA k 1 R k l v k , v k ) < Y, c k l (A k v k , v k ) max 

*=i fc=i tei Vk *° 


(A k A k l R k l v k ,v k ) 

(A k v k ,v k ) 


j j j 

< Y c k luj o ~ l {A k v k ,v k ) = Y u v l { A hvkJkV k ) = Yuo'WhvkWA < 

k = 1 k = 1 k = 1 


So 

(do 


v\\a, 


with v = Z)fc=x I k Vk- We can now employ this result to establish positivity of B x \ 


a = ( Av ’ v ) = Y( Av ^k v k) = Y{fkMv k ) = YiR^ 2 i T k A ^RkW 2 -k) 

k=\ 


j 

I 

fc=i 


j 

r 

tel 


By using the Cauchy-Schwarz inequality first in the ^-inner-product and then in M J , we have 
that 

/ j \ 1/2 / j \ V 2 

II^IIa — \ Y(RkRk lc k ll2 Vk,RZ l c k ll2 v k )j \^Y(R k c k /2 llAv,c k /2 llAv) J 


< 


5o\ 1/2 

U 0 J 




f J \ 1 / 2 / a V 1/2 

[YihRkCkllAv,^)) = ( — ) \\v\\ A (BiAv,Av ) 1/2 . 

\k=i J VcJ o y 


Finally, we divide by ||u||a and square to obtain 


(do 


(BiAv,Av) > 7 T II^IIa > 0 , Vu G R, v^O. 
•jo 


□ 


Remark 7. Condition 1 is naturally satisfied for k = 1, . . . , J, with c k = 1, since the asso- 
ciated I k and I k are usually inclusion and orthogonal projection operators (which are natural 
adjoints when the inner-products are inherited from the parent space, as in domain decompo- 
sition). The fact that 1 0 = c 0 Iq needs to be established explicitly. Condition 2 requires the 
use of SPD subdomain solvers. The condition will hold, for example, when the subdomain 
solve is exact and the subdomain problem operator is SPD. (The latter is naturally satisfied 
by condition 1 and the full rank nature of I k .) Finally, condition 3 is nontrivial and needs to 
be checked explicitly. The condition holds when the coarse space problem operator is SPD and 
the solve is exact. Note that variational conditions are not needed for the coarse space problem 
operator. 

Additive multigrid. Given are the Hilbert space R and J — 1 nested subspaces I k R k such 
that I\Hi C I 2 H 2 C • • • C C %j = R . The operators I k and I k are the usual linear 

operators between the different spaces, as in the previous sections. 
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The error propagator of an additive MG method is defined explicitly: 


E = I-BA = I- uihRJ 1 + I 2 R 21 2 + • • • + + Rj)A. (15) 

This can be thought of as the sum method analyzed earlier by taking B 0 = Y.l=l hRkI k and 
Bx = Rj. This identification allows for the use of Lemma 14 to establish sufficient conditions 
to guarantee that additive MG yields an SPD preconditioner. 

Theorem 4. Sufficient conditions for symmetry and positivity of the additive multigrid op- 
erator B defined in (15) are 

1. I k = Cfc/J, c fc > 0, k = 1, . . . , J - 1; 

2. Rj is SPD in %; 

3. Rk is symmetric non-negative in Rk, k — 1, . . . , J — 1. 

Proof. Symmetry of B 0 and B x is obvious. B x is positive by condition 2. Non-negativity of B 0 
follows from 

j- 1 j- 1 

(B 0 u,u) = ^2(IkRk(c k Ik) T u,u)-= ]T c k (R k lluJlu ) >0, VwG R,u # 0. 

k- 1 fe=l 

□ 


Remark 8. Condition 1 of the theorem has to be imposed explicitly. Conditions 2 and 3 
require the smoothers to be symmetric. The positivity of R j is satisfied when the fine grid 
smoother is convergent, although this is not a necessary condition. The non- negativity of 
R k , k < J, has to be checked explicitly. When the coarse problem operators A k are SPD, this 
condition is satisfied, for example, when the smoothers are non-divergent. Note that variational 
conditions for the subspace problem operators are not required. 

NUMERICAL RESULTS 

The Poisson-Boltzmann equation describes the electrostatic potential of a biomolecule lying 
in an ionic solvent. This nonlinear elliptic equation for the dimensionless electrostatic potential 
u( r) has the form 

/ 47TP 2 \ N™ 

— V • (e(r)Vu(r)) + R 2 sinh(«(r)) = Xj Zi ^( T “ r i)» reM 3 , u( 00) = 0 . 

The coefficients appearing in the equation are discontinuous by orders of magnitude. The 
placement and magnitude of atomic charges are represented by source terms involving delta- 
functions. Analytical techniques are used to obtain boundary conditions on a finite domain 
boundary. 

We will compare several MG and DD methods for a two-dimensional, linearized form of 
the Poisson-Boltzmann problem, modeling a molecule with three point charges. The surface 
of the molecule is such that the discontinuities do not align with the coarsest mesh or with 



Figure 1: Example 1: Nested finite element meshes for MG. 





Figure 2: Example 1: Overlapping subdomains for DD. 

the subdomain boundaries. Beginning with the coarse mesh shown on the left in Figure 1, we 
uniformly refine the initial mesh of 10 elements (9 nodes) five times, leading to a fine mesh of 
2560 elements (1329 nodes). Piecewise linear finite elements, combined with one-point Gaussian 
quadrature, are used to discretize the problem. The three coarsest meshes used to formulate 
the MG methods are given in Figure 1. For the DD methods, the subdomains, corresponding 
to the initial coarse triangulation, are given a small overlap of one fine mesh triangle. The 
DD methods also employ a coarse space constructed from the initial triangulation. Figure 2 
shows three overlapping subdomains overlaying the initial coarse mesh. Computed results are 
presented in Tables 1 to 4. Given for each experiment is the number of iterations required to 
satisfy the error criterion (reduction of the A-norm of the error by 1(T 10 ). We report results for 
the unaccelerated, CG-accelerated, and Bi-CGstab-accelerated methods. The execution time 
differs for each method; normalized costs are tabulated in [5]. 

Multiplicative multigrid. The results for multiplicative V-cycle MG are presented in Table 1. 
Each row corresponds to a different smoothing strategy and is annotated by (iq, u 2 ), with v\ pre- 
smoothing sweeps and v 2 post-smoothing sweeps. An “f” indicates the use of a single forward 
Gauss-Seidel sweep, while a “b” denotes the use of the adjoint - of the latter, i.e., a backward 
Gauss-Seidel sweep. Two series of results are given. For the first set, we explicitly imposed 
the Galerkin conditions when constructing the coarse operators. In this case, the multigrid 
algorithm is guaranteed to converge (cf. [5]). In the second series of tests (corresponding to 
the numbers in parentheses) the coarse mesh operators are constructed using standard finite 
element discretization. In that case, Galerkin conditions are not satisfied everywhere due to 
coefficient discontinuities appearing within coarse elements; hence, the MG method may diverge 

(MV). 

The unaccelerated MG results clearly illustrate the symmetry penalty given in Lemma 10. 
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Table 1: Example 1: Multiplicative MG with variational (discretized) coarse problem 


V\ 

V2 

UNACCEL 

CG 

Bi-CGstab 

f 

0 

65 

(DIV) 

O 

o 

A 

(>100) 

14 

(16) 

f 

b 

55 

(DIV) 

16 

(18) 

10 

(15) 

f 

f 

40 

(31) 

30 

(>100) 

9 

(9) 

ff 

0 

39 

(48) 

>100 

(>100) 

8 

(10) 1 

fb 

0 

53 

(DIV) 

>100 

(>100) 

10 

(11) 

0 

ff 

39 

(29) 

29 

(>100) 

8 

(9) 

0 

fb 

53 

(DIV) 

17 

(99) 

10 

(12) 

fb 

fb 

34 

(27) 

12 

(13) 

8 

(8) 

ff 

bb 

28 

(18) 

11 

(11) 

7 

(7) 

ff 

ff 

24 

(15) 

12 

(12) 

6 

(6) 

fff 

f 

24 

(15) 

17 

(27) 

6 

(6) 

ffff 

0 

25 

(17) 

>100 

(»10Q) 

7 

(6) 


Table 2: Example 1: Multiplicative DD with variational (discretized) coarse problem 


Accel. 

subdomain solve 

forw 

forw/back 

forw/forw 

UNACCEL 

exact 

40 

(42) 

38 

(39) 

20 

(21) 


symmetric 

279 

(282) 

146 

(149) 

140 

(141) 


adjointed 

- 

110 

(112) 

102 

(103) 


nonsymmetric 

189 

(191) 

102 

(104) 

95 

(96) 

CG 

exact 

>500 

(>500) 

13 

(13) 

20 

(20) 


symmetric 

140 

(56) 

24 

(24) 

29 

(27) 


adjointed 

- 

- 

21 

(21) 

25 

(26) 


nonsymmetric 

135 

(83) 

22 

(23) 

28 

(28) 

Bi-CGstab 

exact 

9 

(9) 

9 

(9) 

6 

(6) 


symmetric 

23 

(23) 

17 

(16) 

16 

(16) 


adjointed 

- 

- 

14 

(14) 

14 

(13) 


nonsymmetric 

19 

(20) 

13 

(13) 

13 

(13) 


The nonsymmetric methods are always superior to the symmetric ones (the cases (f,b), (ff,bb), 
and (fb,fb)). Note that minimal symmetry (ff,bb) leads to a better convergence than maximal 
symmetry (fb,fb) . The correctness of Lemma 10 is illustrated by noting that two iterations of the 
(f,0) strategy are actually faster than one iteration of the (f,b) strategy; also, compare the (ff,0) 
strategy to the (ff,bb) one. The CG-acceleration leads to a guaranteed reduction in iteration 
count for the symmetric preconditioners (see Lemma 12). We observe that the unaccelerated 
method need not be convergent for CG to be effective. CG appears to also accelerate some 
non-symmetric linear methods. Yet, it seems difficult to predict failure or success beforehand in 
such cases. The most robust method appears to be the Bi-CGstab method. Note the tendency 
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Table 3: Example 1: Additive MG with variational (discretized) coarse problem 


u 

UNACCEL 

CG 

Bi-CGstab 

{ 

175 (>1000) 

>100 (>100) 

23 (52) 

S 

110 (>1000) 

119 (168) 

19 (43) 

fb 

146 (>1000) 

34 (54) 

23 (49) 

ffff 

95 (>1000) 

28 (67) 

17 (37) 

ffbb 

100 (>1000) 

27 (47) 

17 (34) 

fbfb 

95 (>1000) 

28 (48) 

20 (43) 


Table 4: Example 1: Additive DD with variational (discretized) coarse problem 


subdomain solve 

UNACCEL 

CG 

Bi-CGstab 

exact 

>1000 (>1000) 

34 (34) 

25 (27) 

symmetric 

>1000 (>1000) 

57 (57) 

50 (49) 

nonsymmetric 

>1000 (>1000) 

69 (65) 

38 (41) 


to favor the nonsymmetric V-cycle strategies. Overall, the fastest method proves to be the 
Bi-GGstab-acceleration of a (very nonsymmetric) V(l,0)-cycle. 

Multiplicative domain decomposition.- Results for multiplicative DD are given in Table 2. In 
the column “forw” the iteration counts reported were obtained with a single sweep though the 
subdomains on each multiplicative DD iteration. The other columns correspond to a symmetric 
forward/backward sweep or to two forward sweeps. Four different subdomain solvers are used: 
an exact solve, a symmetric method consisting of two symmetric Gauss-Seidel iterations, a 
nonsymmetric method consisting of four Gauss-Seidel iterations, and, finally, a method using 
four forward Gauss-Seidel iterations in the forward subdomain sweep and using their adjoint 
(i.e., four backward Gauss-Seidel iterations) in the backward subdomain sweep. The latter leads 
to a symmetric iteration; see Remark 2. Note that the cost of the three inexact subdomain 
solvers is identical. 

Although apparently not as sensitive to operator symmetries as MG, the same conclusions 
can be drawn for DD as for MG. In particular, the symmetry penalty is seen for the pure 
DD results. Lemma 10 is confirmed since two iterations in the column “forw” are always more 
efficient than one iteration of the corresponding method in column “forw/back.” The CG results 
indicate that using minimal symmetry (the “adjointed” column) is a more effective approach 
than the fully symmetric one (the “symmetric” column). The most robust acceleration is the 
Bi-CGstab one. 

Additive multigrid. Results obtained with an additive multigrid method are reported in 
Table 3. The number and nature of the smoothing strategy is given in the first column of the 
table. 

In the case of an unaccelerated additive method, the selection of a good damping param- 
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eter is crucial for convergence of the method. We did not search extensively for an optimal 
parameter; a selection of uj = 0.45 seemed to provide good results in the case when the coarse 
problem was variationally defined. No uj- value leading to satisfactory convergence was found 
in the case when the coarse problems were obtained by discretization. In the case of CG ac- 
celeration the observed convergence behavior was completely independent of the choice of uo; 
see Remark 2. The symmetric methods ( v = fb , ffbb, fbfb ) are accelerated very well. Some of 
the nonsymmetric methods are accelerated too, especially when the number of smoothing steps 
is sufficiently large. The best method overall appears to be the Bi-CGstab acceleration of the 
nonsymmetric multigrid method with a single forward Gauss-Seidel sweep on each grid-level. 

Additive domain decomposition. The results for additive DD are given in Table 4. The 
subdomain solver is either an exact solver, a symmetric solver based on two symmetric (for- 
ward/backward) Gauss-Seidel sweeps, or a nonsymmetric solver based on four forward Gauss- 
Seidel iterations. No value of uj was found that led to satisfactory convergence of the unaccel- 
erated method. The CG-acceleration performs well when the linear method is symmetric and 
worse if nonsymmetric. Again, the best overall method is the Bi-CGstab-acceleration of the 
nonsymmetric additive solver. 


REFERENCES 

[1] S. F. Ashby, T. A. Manteuffel, and P. E. Saylor. A taxonomy for conjugate gradient 
methods. SIAM J. Numer. Anal . , 27(6) :1542— 1568, 1990. 

[2] M. Dryja and O. B. Widlund. Towards a unified theory of domain decomposition algo- 
rithms for elliptic problems. In Third International Symposium on Domain Decomposition 
Methods for Partial Differential Equations , T. F. Chan, R. Glowinski, J. Periaux, and 
O. B. Widlund, eds. SIAM, Philadelphia, PA, pp. 3-21, 1989. 

[3] W. Hackbusch. Iterative Solution of Large Sparse Systems of Equations. ' Springer- Verlag, 
Berlin, Germany, 1994, 

[4] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. 
J. Research of NBS, 49:409-435, 1952. 

[5] M. Holst and S. Vandewalle.. Schwarz methods: to symmetrize or not to symmetrize. SIAM 
J. Numer. Anal, (to appear). 

[6] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving 
nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7(3):856-869, 1986.- 

[7] P. Sonneveld. CGS: A fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. 
Sci. Statist. Comput., 10:36-52, 1989. 

[8] H. A. van der Vorst. BI-CGSTAB: A fast and smoothly converging variant of BI-CG for the 
solution of nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 13(2):631-644, 
1992. 

[9] J. Xu. Theory of Multilevel Methods. Ph.D. thesis, Technical Report AM 48, Department 
of Mathematics, Penn State University, University Park, PA, July 1989. 

[10] J. Xu. Iterative methods by space decomposition and subspace correction. SIAM Review, 
34(4):581-613, 1992. 


378 



A Mixed Finite Volume Element Method for Flow Calculations in Porous 

Media 


Jim E. Jones 

Institute for Computer Applications in Science and Engineering 
NASA'Langley Research Center 


SUMMARY 


A key ingredient in the simulation of flow in porous media is the accurate de- 
termination of the velocities that drive the flow. The large scale irregularities of 
the geology, such as faults, fractures, and layers suggest the use of irregular grids in 
the simulation. Work has been done in applying the finite volume element (FVE) 
methodology as developed by McCormick in conjunction with mixed methods which 
were developed by Ravi art and Thomas. The resulting mixed finite volume element 
discretization scheme has the potential to generate more accurate solutions than stan- 
dard approaches. The focus of this paper is on a multilevel algorithm for solving the 
discrete mixed FVE equations. The algorithm uses. a standard cell centered finite 
difference scheme as the ‘coarse’ level and the more accurate mixed FVE scheme as 
the ‘fine’ level. The algorithm appears to have potential as a fast solver for large size 
simulations of flow in porous media. 


The Mixed Finite Volume Element Discretization 


In this first section, we briefly introduce the mixed finite volume element (FVE) 
discretization technique. We will not dwell too much on the details of the discretiza- 
tion itself as our focus here is on solving the discrete set of equations that the dis- 
cretization produces; a detailed description of the discretization can be found in [7]. 

We begin by considering the following partial differential equation defined on a 
domain D in 1Z 2 : 


f ~ V ' A(x)V<£(x) = /(x) xeft, 

\ V0(x)-rj = g(x) x E dfl. ' ' 

Here we assume the diffusion coefficient A is diagonal, but values of the coefficients 
may jump orders of magnitude at material interfaces. In the context of reservoir 
simulation, this is the pressure equation for incompressible single-phase flow where 
<f) is the pressure in the reservoir f l, and the boundary condition specifies the flux 
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Figure 1 

on dVt. As one of our goals for the new discretization is accurate 
flow velocities, we will begin by reformulating this equation as a 
of equations where velocity appears explicitly in the equations, 
introducing the flow velocity variables via the definition, 

v = — AV<£, (2)' 

and then rewriting the partial differential equation in (1) as, 

V • v = /. (3) 

In the context of reservoir simulation, definition (2) is Darcy’s law and equation (3) 
is the mass conservation law. In reservoir simulation, this same approach of treating 
flow velocity explicitly has been used in mixed finite-element methods with consid- 
erable success [5], [6], [13]. Equations (2) and (3) along with the boundary condition 
from equation (1) represent the first order system that we discretize using the mixed 
FVE method. Because of the irregularity of reservoir geology, faults, layers, etc., 
uniform rectangular grids are not adequate in modeling the flow. The mixed FVE 
discretization was developed for a logically rectangular grid of irregular quadrilater- 
als. An example of such a grid is shown in Figure 1. To discretize this system, we 
follow the finite volume element (FVE) principles developed in [3], [8], [9]. The two 
major components of any FVE discretization scheme are a choice of control vol um es 


approximations of 
first order system 
This is done by 
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Figure 2 


to integrate the continuous equation over and a choice of finite element spaces for the 
unknowns. 

Important in developing the discretization for general quadrilaterals is the map- 
ping relating a general quadrilateral to a reference one. Consider the quadrilateral P 
with vertices (x 0 o, 2/oo), (zio> 2/io), (®oi, 2/oi), and (x n ,yu) shown in Figure 2. Let the 
reference quadrilateral P be the unit square. Then there is a unique bilinear mapping 
of P onto P given by, 

x(x, y) = xqo + (aqo - x 00 )x + (x 0 i — x 00 )y + (in — x w — £ 0 i + x 0 o)xy 
y{x,y) = yoo + ( 2/10 - ym)x + {y 0 1 - yoo)y + (yn - 2/10 - 2/01 + yoo)xy 

If P is convex, then this mapping has an inverse. We restrict ourselves to convex 
quadrilaterals, so for each (x,y) € P we have an associated point ( x,y ) £ P. Shown 
in Figure 2 are several vectors that will be useful later in describing the components 
of our discretization technique. For each (x,y) € P we define four vectors. 

X(x,?/) is the image of the unit vector (1,0) in P, 

Y(x,y) is the image of the unit vector (0,1) in P, 
r]x(x,y) is a unit vector orthogonal to Y (x,y), 
rj y (x,y ) is a unit vector orthogonal to X.(x,y). 
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Figure 3 


For the finite element spaces we use the lowest order Raviart-Thomas elements 
on the quadrilateral elements, see [2], [14] and [11]. They can be defined as follows. 
The characteristic functions of the quadrilaterals provide a basis for the finite element 
space for <f>. The basis functions for v are best seen by associating degrees of freedom 
with normal components on edges of quadrilaterals. A typical basis function for 
the finite element space for v has support on two adjacent quadrilaterals and has a 
constant normal component on the edge shared by the quadrilaterals, and its normal 
component is zero on other edges. The magnitude of the basis function is such that 
the flux on the common edge is one, 



These conditions alone do not uniquely determine the basis function; the following 
additional condition on the finite element space is needed. Within any quadrilateral 

P, 

v • 77a;|| Y|j varies linearly with x, constant with y, 

v • j/ y ||X|| varies linearly with y, constant with x. 

A typical basis function is represented in Figure 3. We note that the basis functions 
have continuous normal components across grid interfaces. With this we can guar- 
antee that our computed flow velocity will also have continuous normal component 
across grid edges. The true physical solution also has this property, continuous nor- 
mal component of velocities, but not every numerical scheme for approximating it 
does, as pointed out in [12]. 

We now need to choose the control volumes. The quadrilaterals used to describe 
the grid are the natural choice for the control volumes for equation (3). This will 


382 



produce a scheme with a local conservation property on these quadrilateral grid cells. 
So we integrate equation (3). over each grid cell P,-j, 



V • vdxdy = f f. 


Applying the divergence theorem, we get. 


f v • yds = f f. 

JdP,j JPi.j 

The left-hand side of this equation is just the sum of the fluxes on edges of P,j, so 
the discretization of the mass conservation equation is. 


'i+l.j 


«?j + v Ui 



(4) 


Here u/ +1 j and u h i - denote the discrete fluxes on the ‘east' and ‘west.’ edges of the grid 
cell, respectively. Similarly, and v-j denote the discrete fluxes on the ‘north’ 

and ‘south’ edges of the grid cell, respectively. If we assume / is (approximated by) 
a function that is piecewise constant, we can replace the integral on the right hand 
side by: 

fijxAREA (P{j) . 

If we have more information about /, we can use a more accurate approximation 
of the integral. In choosing the control volumes for Darcy’s equation, we use the 
following control volumes which straddle grid edges. Consider two adjacent grid 
cells, Pi-i, j and P,j. U,j then consists of the image of (1/2, 1). x (0,1) under the 
mapping for P;-ij and the image of (0, 1/2) x (0, 1) under the mapping for P,j . In 
Figure 4, U,-j is the shaded region. We associate this volume with the ‘vertical’ edge 
shared by Pi-i,j and P,j which the control volume straddles. We also have control 
volumes associated with ‘horizontal’ edges. For adjacent grid cells, P;j-i and P;j, 
V i,j consists of the image of (0,1) x (1/2,1) under the mapping for P{,j-i and the 
image of (0, 1) x (0, 1/2) under the mapping for P,j . The discretization of Darcy’s 
equation proceeds as follows. We dot equation (2) with c;X(x,y) and integrate over 
the ‘left half’ of U 4J -. Similarly, we dot equation (2) with c r X(x,y) and integrate 
over the ‘right half’ of U ,-j. Here q and c r are scaling constants chosen in such a 
way to eliminate integral terms on the interface between Pi-i,j and P,j where (j) h is 
undefined. We then add the two integrals to get our final result. We will present 
here only the form of the equation that this integration gives rise to. Note that we 
perform the same kind of integrations for the V volumes as well, only here we dot 
Darcy’s law with a scaling of the vector Y. For the U volume shown in Figure 4, we 
get a discrete Darcy equation relating the pressure drop between the two cells to the 
fluxes on cell edges, 


rq Ui—\ ,j T C2U{,j 

-\-CsU{ + i,j + C^Vi—ij + C 5 U;— i J+ i + CgUj'j + C7Vi,j + i (5) 

+ 1 ~ == 0. 
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V(i-l,j+l) 


v(ij+l) 



v(i-lj) 


Figure 4 


Here \E\ is the length of the edge shared by the two adjacent grid cells. The values of 
the coefficients C \ , . . . , c 7 depend on the position of the vertices defining the two grid 
cells and on the values of the diffusion coefficient within the two cells. The ‘cross’ 
terms, c 4 , . . . , c 7 , will generally be nonzero even when the diffusion coefficient A is 
diagonal. In summary, for each grid cell we have a discrete conservation equation of 
the form of equation (4) and for each grid edge we have a discrete Darcy equation of 
the form of equation (5). 


A Multilevel Algorithm 


Previously in [7] , a multigrid algorithm was developed to solve the discrete set of 
equations that the mixed FVE method produces. In this algorithm the mixed FVE 
discretization was used on coarser levels and interpolation and restriction were done 
in a way consistent with the finite element spaces and control volumes on different 
grids as in [9]. This yields a very efficient algorithm with two limitations. The first is 
that the jumps in the diffusion coefficient must occur (if at all) at grid edges on the 
coarse grid. The second is that the irregularity is described in a coarse grid which 
is then refined by bilinear coordinates to generate finer grids. One cannot apply this 
mixed FVE based multigrid algorithm to the equations on the coarsest grid; they 
must be solved some other way. In a practice, both these problems are limitations 
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on the coarsest grid; it must be fine enough to capture the reservoir geology and the 
jumps in the diffusion coefficient. These limitations may result in the set of discrete 
equations on the coarsest grid being too large to solve directly. With these limitations 
we were forced to seek an alternative algorithm, either to be used alone or as the solver 
on the coarsest grid allowable in a mixed FVE based multigrid algorithm. 

We will explain our approach as a two level multigrid algorithm; this is somewhat 
an incorrect name as we will have only one grid. The fine level problem is the mixed 
FVE discretization of the first order system, 

A -1 v + V</> = 0, 

V • v = /. 


We will write the mixed FVE equations in matrix form as, 

( M grad h W v h \ ( 0 A 

V div h 0 ) \ <f>h ) \ fh ) ' 


( 6 ) 


Here, M is the mass matrix that comes from the discretization of Darcy’s equation 
and gradh and divh are the grid h discrete operators corresponding to the continuous 
operators grad and div. We define the residuals as, 


( 0 \ _ ( M grad h \ ( y fc \ 

V fh j \div h 0 ) \ih ) 


( 7 ) 


where the variables with hats denote a current approximate solution to equation (6). 
We define the errors as, 

e h — v h _ $ h , 

e$ = <t> h - 4> h - 

We then write the error equation, 


/ M grad h \ / 4 ^ / r v 

V div h ° / v e 5 / v r 4> 


( 8 ) 


Now rather than using a coarser grid with the mixed FVE discretization to approxi- 
mate the error equation, we will use the same grid with a standard cell-centered finite 
difference approximation. This will be our ‘coarse’ level in the multigrid algorithm. 
The ‘coarse’ level version of the error equation can then be written in matrix form as, 

M gradh \ f v h \ _ ( r £ 

div h 0 J \ <j> h ) ~ \ 



The only difference between equations (8) and (9) is the mass matrix. Assume the 
grid is rectangular and the diffusion coefficient is diagonal, 


A = 



0 

a y 
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Then in (9) the mass matrix M is diagonal and is computed from the diffusion 
coefficient by 

<i = x = °’ ( 10 ) 

The point is that in equation (9) we can eliminate the velocity variables. We have, 

v h = M -1 (grad h <j) h + . (11) 

Using this we can write equation (9) as, 

- div h M~ x grad h <^) h = r\ - divuM^r^. (12) 

Black box multigrid [4] was developed to solve precisely this type of equation. In 
the multilevel solver, we use black box multigrid to solve this equation for <f>, use 
equation (11) to get w h , and use these approximations of the error to correct our 
mixed FVE approximation. In summary when the grid is rectangular, we can use a 
standard cell centered finite difference discretization as the ‘coarse’ level for the ‘fine’ 
level mixed FVE discretization. We would like do something similar in the case of a 
general quadrilateral grid. However, one of our motivations for looking at the mixed 
FVE discretization was that it can be applied in a clear and direct way to general 
quadrilateral grids where standard cell centered finite differences cannot. A rigorous 
cell centered finite difference discretization for general quadrilateral grids does not 
currently exist. Fortunately, we do not need to ask this much of the discretization 
on the ‘coarse’ level as we will use the solution from the mixed FVE discretization 
for our final computation. We would like to use the ‘coarse’ level discretization 
only to accelerate the relaxation process on the ‘fine’ level. We have chosen to use 
equation (10) to define M in the general quadrilateral case just as in the uniform case. 
There are perhaps more sophisticated ways of defining M, but we have found that this 
simple definition works well for most grids. It is clear that for very distorted grids, 
our M will be a poor approximation to M; however, we will see in the next section 
that for mildly distorted grids the two level method works as well as in the uniform 
grid case. This two level approach is similar to the work in [10] where black box 
multigrid was used as a ‘coarse’ level for a Lagrangian hydrodynamics application. 


Computational Results 


Problem 1 


We begin with a test problem using a uniform square grid on fl = [ — 1, 1] x [ — 1, 1]. 
The numerical experiment is designed to test the robustness of the two level approach 
with respect to discontinuities in the diffusion coefficient. The diffusion coefficients, 
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a x and a y , were separately and randomly assigned values between .01 and 100 for each 
grid cell. The problem is thus anisotropic and the coefficients jump several orders of 
magnitude between cells. The two level algorithm described in the previous section 
was used with two alternating line relaxation sweeps on the mixed FVE equations 
before calling in black box multigrid to solve the ‘coarse’ problem and one alternating 
line relaxation sweep after. Here x-line relaxation, for instance, means changing all 
variables, and u, associated with cells sharing the same j index so that all 

discrete equations (conservation and Darcy) associated with those cells are satisfied. 
This involves inverting a banded 4 n — 3 by 4n — 3 matrix, where n is the number 
of cells in the x-direction. This is a relatively expensive relaxation process, but it is 
needed to deal with anisotropic coefficients. As pointed out in [1], block relaxation 
is needed for smoothing when cells are coupled to some neighboring cells strongly 
and to other neighboring cells weakly. One point to consider is: how well do we 
need black box multigrid to solve the ‘coarse’ problem? In [10] only one cycle of 
black box multigrid was used to approximately solve the ‘coarse’ problem. We found 
that for this difficult test problem (note: black box multigrid convergence factors 
were approximately equal to .6) that performing more than one cycle of black box 
multigrid improved the overall convergence factors of the two level method. In the 
results reported below we used five cycles of black box multigrid to approximately 
solve the ‘coarse’ problem, although similar multilevel convergence factors can be 
obtained with fewer (say, two or three) cycles. The asymptotic convergence factors 
for the two level method are presented below. 


Grid size 

Convergence Factor 

16 x 16 

.43 

32 x 32 

.44 

64 x 64 

.46 


We see that this two level approach, while not having great convergence factors, 
does exhibit convergence factors that are constant with growing problem size. The 
point of considering this two level method was to allow us to deal with the problem 
where the coarsest grid for the mixed FVE based multigrid algorithm is still too fine 
and has too many unknowns to solve the mixed FVE equations using a direct method. 
In practice, one could use the mixed FVE based multigrid algorithm until one reached 
the coarsest grid that was aligned with the discontinuities in the diffusion coefficient. 
Then, on this grid, use the two level approach of the previous section. 


Problem 2 


In this experiment we began with a uniform square grid on fi = [— 1,1] x [-1,1] 
and distorted the grid in the following way. We moved each interior vertex in both the 
x and y directions separately by a random number between —.2 h and .2 h, where h was 
the mesh size of the original square mesh. The resulting mesh for the 16 x 16 problem 
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Figure 5 
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is shown in Figure 5. Then we discretized Poisson’s equation, a x = a y = 1, using the 
mixed FVE method and applied the two level algorithm described previously. Again, 
two alternating line relaxation sweeps were performed before solving the ‘coarse’ 
problem and one alternating line relaxation sweep was performed after, and five cycles 
of black box multigrid were used to solve the ‘coarse’ problem. Average convergence 
factors for the two level approach on different size grids are shown below. 


Grid size 

Convergence Factor 

16 x 16 

.07 

32 x 32 

.08 

64 x 64 

.08 


The convergence factors are surprisingly good, given the quality of the approxi- 
mation used on the ‘coarse’ level. As discussed previously, we basically assume the 
grid is uniform in forming the mass matrix M for the discrete Darcy equations on the 
‘coarse’ level. This appears to work fine for the mildly distorted grids like the grids 
in this numerical experiment and, quite likely, the grids one would use in practical 
applications. When the grid is very distorted, say 50% rather than 20% distortion, 
the two level algorithm can fail to converge and may even diverge. The reason is 
that the very poor approximation of the mass matrix results in a correction from the 
‘coarse’ level that has little, if anything, to do with the ‘fine’ level error. It is possible 
that this could be remedied by a more sophisticated choice for M, but this has not 
been investigated. 


Problem 3 


In the next numerical experiment we use the same grids as in the previous experi- 
ment and solve the mixed FVE discretization to the diffusion equation with diagonal 
diffusion coefficient where on each cell in this grid the diffusion coefficients a x and 
a y were separately set to random values between .01 and 100. The results, average 
convergence factors, are shown below. 


Grid size 

Convergence Factor 

16 x 16 

.43 

32 x 32 

.40 

64 x 64 

.38 


While the convergence factors are not that great, they likely are acceptable espe- 
cially if one is using the two level approach only on the coarsest grid of the mixed 
FVE based multigrid algorithm. There the amount of work on finer grids in the 
mixed FVE based multigrid algorithm will be much larger than the work of the two 
level algorithm on the coarsest grid, even if several cycles of the two level algorithm 
are required. This last experiment is reflective of the types of problems one would 
solve in actual reservoir simulation. It appears that this two level approach has the 
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potential to provide a fast solution to the more accurate mixed FVE discretization, 
compared to standard cell centered finite differences, in cases where the previously 
developed mixed FVE based multigrid algorithm cannot be applied. 


Conclusions 


The two level algorithm presented in this paper provides an efficient method for 
solving the mixed FVE equations on general quadrilateral grids. One point about 
the “poor” convergence factors for the two level method seen in problems 1 and 
3: these results, in an indirect way, illustrate the superiority of the mixed FVE 
discretization over the standard cell centered finite difference discretization when the 
diffusion coefficient is discontinuous, even on uniform grids. The “poor” convergence 
factors tell us that there is a significant difference between the discretizations, and as 
demonstrated in [7], the mixed FVE discretization is the more accurate of the two. 
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Implicit Extrapolation Methods for Variable Coefficient Problems 


M. Jung 
U. Rude 


SUMMARY 

Implicit extrapolation methods for the solution of partial differential equa- 
tions are based on applying the extrapolation principle indirectly. Multigrid tau- 
extrapolation is a special case of this idea. In the context of multilevel finite element 
methods, an algorithm of this type can be used to raise the approximation order, 
even when the meshes are nonuniform or locally refined. Here previous results are 
generalized to the variable coefficient case and thus become applicable for nonlinear 
problems. The implicit extrapolation multigrid algorithm converges to the solution 
of a higher order finite element system. This is obtained without explicitly construct- 
ing higher order stiffness matrices but by applying extrapolation in a natural form 
within the algorithm. The algorithm requires only a small change of a basic low order 
multigrid method. 


Introduction 


Implicit extrapolation is an efficient technique to improve the accuracy of a multilevel 
solver. When combined with extrapolation, the multilevel principle is not only used 
as the basis for a fast algebraic solver, but also to increase the approximation order. 
The basic idea of extrapolation is to exploit discretizations on different levels. 

In classical Richardson extrapolation, two or more approximations from different 
meshes are combined linearly to eliminate the dominating terms of the error expan- 
sion. For partial differential equations this has been studied in the context of finite 
difference discretizations, see e.g. Marchuk and Shaidurov [1] and in the framework 
of finite elements (FE), see e.g. Blum, Lin, and Rannacher [2]. These techniques are 
explicit extrapolation methods, since they use approximate solutions directly. 

Here we propose a different approach, where extrapolation is applied indirectly 
to intermediate quantities of the solution process. Such methods are called implicit 
extrapolation techniques. Methods of this type may be related to defect correction, 
and — if combined with multigrid — to r-extrapolation, see e.g. Brandt [3], Hack- 
busch [4], Schaffer [5], or Bernert [6]. However, these methods are mathematically 
still motivated by expansions of the truncation error, which in turn require uniform 
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meshes. A generalization to locally uniform meshes can e.g. be found in McCormick 
and Rude [7]. 

In Jung and Rude [8] we have presented an implicit finite element extrapolation 
technique which is based on extrapolating the quadrature rules used to compute the 
stiffness matrices. In [8] it has been shown that within the nested spaces of a multi- 
level finite element algorithm, this implicit extrapolation converts an /i-hierarchical 
to a p-hierarchical basis. This improves the approximation order, independent of 
any uniformity constraints on the mesh and without requiring global asymptotic er- 
ror expansions. On the other hand, the algorithm presented in [8] is algebraically 
just a special case of multigrid r-extrapolation, which differs from the usual multi- 
level process only by an additional factor appearing in the restriction of the residual. 
The method is therefore particularly convenient to implement in any given multigrid 
algorithm. 

The analysis of [8] was still restricted to problems with element-wise constant co- 
efficients. In this present paper we will now generalize these results to show that an 
analogous algorithm can be used for variable coefficients as long as the coefficients are 
smooth enough to justify higher order approximations at all. The analysis is again 
based on studying quadrature formulas for the stiffness matrices, and using extrapo- 
lation to construct quadrature formulas which are exact for higher order polynomial 
functions. For variable coefficients, this is now significantly more complicated and 
our analysis requires nonstandard quadrature rules. These rules and the multilevel 
algorithm are introduced in detail. The final section presents a numerical example 
showing the efficiency of the method. 


The boundary value problem and its finite element discretization 


In this paper we consider two-dimensional second order elliptic boundary value prob- 
lems given in the weak formulation 

Find u £ Vo such that a(u,v) = (F,v) for all v £ Vo, (1) 

with 

a(ujv) = / (A(x)V x u, V x v) dx (2) 

J Q 

and 

(F,v) = I fvdx. (3) 

Q 

fi is a two-dimensional bounded polygonal domain. The space Vo = Hq(Q,) * s a 
subspace of the Sobolev space i7 1 (n), where the functions of Vo satisfy homogeneous 
Dirichlet boundary conditions on the boundary d fi. The restriction to this type 
of boundary conditions is only to keep the exposition as simple as possible. The 
generalization to somewhat more general boundary conditions is analogous to [8]. 

Furthermore, we suppose that the 2x2 matrix A(x) = (a t j(x));j = 1,2 is symmetric 
and positive definite for almost all x £ fi with aij(x) £ The function / 
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belongs to the space W 2 (f&) with q > 2. We need these assumptions to obtain a 
discretization error, which is typical for FE discretizations with piecewise quadratic 
functions and the application of appropriate quadrature rules for the computation of 
the stiffness matrix and the load vector. 

We now discretize (1) by three different finite element spaces. We suppose that two 
nested triangulations 7/_i and Ti of the domain fl are given. The finer triangulation 
Ti results from 7;_i by regular refinement, that is by connecting the midpoints of all 
triangles r = 1 , 2 ,..., in T\-\. Corresponding to the triangulations 
and % we introduce the finite element spaces 

ViLi = spanfpji : i = 1,2,..., iV,_i} C V 0 , (4) 

V? = ^Uspanrf : z = iV ; _ 1 + l,...,7VjcEo, (5) 

V? = V t L _ x U span {?& : i = + 1, . . . , iVj C V 0 . (6) 

The trial functions p k \ k = 1,1 — 1 , are continuous and piecewise linear in each 
triangle of T k and they satisfy 

Pi - for i,j = 1,2, . . . , Ni-i 
p «( x 0)) _ for i,j = iV/_i + 1, . . . , Ni. 

Here = (x[ 3 \ x^) denotes the coordinates of the node and N k is the 
number of nodes of Tk in 0. Sij is the Kronecker symbol. 

The functions q\ l\, i = W-i + l,...,iV/, of (6) are continuous and piecewise 
quadratic in each triangle of T-i. Again, they satisfy 

?Pi(a: (j) ) = for i,j = iV;_ x + 1, . . . , N x . 

The basis of the space Vf we call h -hierarchical basis and the basis of the space V® • 
is called p-hierarchical basis. 

The finite element subspaces V^ x , V t L , V® of (4), (5), and (6), respectively, give 
rise to the finite element stiffness matrices Kj i x , Kj" , and K® as well as the load 
vectors ff_ v ff, and . 

For the computation of the coefficients of the element stiffness matrices and the 
element load vectors in general we must perform numerical integration. We therefore 
need an appropriate quadrature rule which guarantees the same FE discretization 
error as in the case of exact computation of the stiffness matrix and the load vector. 
To investigate the effect of numerical integration we will use well-known results as 
e.g. contained in [9]. For the sake of completeness we summarize some of them. 

The application of quadrature rules for the computation of the matrix elements 
and the elements of the load vector results in an approximate bilinear form a(u, v) and 
an approximate right-hand side Depending on the choice of the quadrature 

rule and the finite element subspace V, i.e. V = V l I l 1 , V = VJ L , or V = V® , we will 
later describe a{u,v) in detail. 
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The approximate bilinear form is called uniformly V -elliptic, if there exists a 
constant a > 0, d independent of V. such that 

d(i\ v) > d||i’||i 2 n f° r l ' £ H. 

Here || • | |i, 2 .n denotes the norm in the Sobolev space 

Using numerical integration, the boundary value problem (1) is approximated by 

Find u G 1 ’ such that a(u.i') = (F, i') for all r G V. (7) 


Theorem 1. (First Lemma of Strang) Let the approximate bilinear form a of (7) 
be uniformly V -elliptic. Then 


■ h||i, 2 ,n < c 



|i. 2 .n + sup 
wev 


| a(i\ ib) - 

- ci(i\ d’)f 

> + sup ■ 

U'6 f 

(F, lb) 

-(F,ib)\\ 

| 

it'| 

i. 2 .n 

1 

1 w 1 

|i, 2 ,n / 


with a constant c which does not depend on the space U. 

Let the solution u G aij G U^(O). i.j = 1.2. / G W q s {Tl) with q > 2 

and q > 2/s, and let the FE subspace V contain piecewise polynomials of degree s, 
i.e. polynomials of degree s on the triangles of the triangulation. Furthermore, let 
the quadrature rule be exact for polynomial of degree 2s — 2 on each triangle. Then 
the following estimate holds (see also [9]) 

\\u — ft ||i, 2, n < ch s f |tt| s +i,2.n + ^2 ||tti:il|s.co.n|| tt|| s +i,2,n + ||/||s, g ,n J • 
v i.j = i J 


Here || • Ks+i^.n and | • | s +i, 2 .n denote norms in Hq +1 (Q) as well as || • || s , 9i n is a norm 
in W q s (Ll). 


A multigrid algorithm with implicit extrapolation step 


In Jung/Rude [8] we have studied the convergence properties of a multigrid algorithm 
with implicit extrapolation step. However, the papers [8] were restricted to problems 
with piecewise constant functions a,-j(: r) and f(x) in If such a problem is dis- 

cretized by linear elements, and the multigrid algorithm is combined with (implicit) 
extrapolation, the iterates converge to the solution given by quadratic elements. In 
this paper we will generalize this result to the case of variable coefficients. It will 
be shown that the extrapolation algorithm converges to the solution obtained with 
quadratic elements. In the analysis of this more general case, we will use special 
nonstandard quadrature rules. 

In the following we will give a brief description of the smoothing procedure and 
the restriction operator used. Then we formulate the multigrid algorithm and study 
the convergence behavior. 
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Numbering the nodes in 7/ such that the nodes which are also in the coarse mesh 
7/_i appear first, we induce a block partitioning of the stiffness matrices 


I<t 


kl 

KL. 

KL, 

k l 


K\ 


Q 


1 X l,vv 1 X l, V 

K? m 


K, 


Q 

l,mv 


( 8 ) 


In the multigrid algorithm we use the following smoothing procedures: 

• Pre-smoothing Gj {uf\ Kf" , f^): Let the initial guess be 

given. Set = ufl and Compute an approximate solution zj m of the system 


K l = f L — K l 7 / b +1 ) — K l 


' L u U) 


(9) 


by means of a linear iterative method starting with the zero vector. We suppose 
that the error transmission operator of the method is of the type 

Il,m 


Then set uj J+1) = (u^, ujj + 

• Post-smoothing Gf (up\ Kf , fj 1 ): We use the same form of algorithm as for 
pre-smoothing. However, we suppose that the error transmission operator of 
the iterative method is of the form Mi <m = Ii <m — such that the 

overall multigrid operator becomes symmetric. 


• We need the injection operator 

jl—l,inj . ^ 

in our algorithm. 

Algorithm MG-EX 
Let an initial guess u\ k ’ 0 ^ be given. 

1. Pre-smoothing: 

V r (k,0) t/L rL 


^> = G!(v}r , ,Kt,L)- 


2. Coarse grid correction: 


( 10 ) 


(a) Compute the defect 



( f L _k l iS k ’ 1 ' ) —K L — - ( f L —K l (M)\ /ip 


(b) Solve 

KLwi% = <£’, . ( 12 ) 

using p, iteration steps of a usual symmetric multigrid ((/ — l)-grid) algo- 
rithm, starting with the 0 vector and returning an approximate solution 
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(c) Correct 


(13) 


(fc, 2) / (fc,l) . ~(fc) (A:,1)\T 


3. Post-smoothing: 




(14) 


and set u] 


(fc+1,0) _ (k,3) 


= u 


Taking into consideration the definition of the smoothing procedures and the 
equivalence of step 2(a) to 




-f L - -f L 
3 ~ l ' v 3 


;-i 


» <.1 


-Rf* - -Af 
3 ,- V 3 '■ 


„(M) 


(15) 


we can interpret our algorithm as a usual multigrid algorithm in the ^-hierarchical 
basis to solve the system of equations 


K 


'L,ex xL,ex 

i yu - h 


(16) 


with 


K 


,rL,ex 


4 J(L _ 1 tsL 4 j/L 
^ J,Y l i vv l—l 

4 jy-L 4 jy-L 



4 fL 1_ f L 

3ij,v 3Ll-l 

4 fL 
3 J-l,m 


(17) 


The main result of this section is that the iterates of the algorithm MG-EX 
converge to a FE solution which has the same order of discretization error as a FE 
solution obtained by p-hierarchical FE functions (p = 2). 

Before we prove this fact, we introduce the quadrature rules that are used to 
compute the stiffness matrices and load vectors. 

To obtain the entries of the stiffness matrices Kf 1 , and K® , respectively, we 

have to compute 

a(p\ J \p { i l) ) = [ (A{x)V x p\ 3 \x), V x p { i\x)) dx, (18) 

where p\ l \ pj^ stand for the functions pj^, p|i\, i,j = 1 , p\ l \ p\ j \ i,j = 

N 1-1 + 1 , . . . , N 1 , in the case of the fi-hierarchical basis. In the case of the p- 
hierarchical basis the functions p\ l \ p\^ stand for pjl\, p|i\, i,j = 1, • • • , iV/_ x , 

?t-i, hj = Ni - 1 + 1,. • • , Ni. 

First we explain the quadrature rules used for the computation of the matrices 
KfLi and Kj" . From (18) we obtain for the entries of Kf"_ l 




x)^xP { l-l{x), Va-pjln 




dx 


L r \ 




(^A(x)V x p! J _ ) 1 (x),V x pj t J 1 (x)) dx , (19) 


where 

cuH = {r : pji £ 0 and pji^ ^ 0 on 8^} . (20) 
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We transform the integrals over 5^ into integrals over the reference element 
A = {( 6 , 6 ) : o < 6 < 1, o < 6 < 1, 6 + 6 < 1}. This leads to 

Lr) ( A ( X )^ xp\- X {x), dx 

J °l - 1 


= jf (A(*HJ, < : > ir T v ( p<L> 1 wo), (jfiir^tp^K))) id«t 4 i>i % 


~ j^{ B(r \ X )^Wp{r){Q,VW a (r){Ci)dt, (21) 

with B( T \x) = (Ji- X )~ 1 A(x)(Ji^\)~ T \detJi^\\ and J^\ from the transformation 



Here x\ r ’ a \ i,j — 1,2, a = 1,2,3, denotes the coordinates of the vertices of the 
triangle 6 - 1 , and ah) as well as p( r ) are the local numbers of the vertices pb) anc j 
Ph). The linear functions c p aM , ip# r), ah), fd h) = 1,2, 3, on the reference element are 
defined by 

¥ j i(0 = 1 - 6 “ 6 . ^ 2(6 = 6 , and </> 3 (0 = 6 - (2.3) 


The following equivalent formulation of (21) is the basis of the application of our 
quadrature rules. 


With the directional derivative 


we obtain 


dip dip dip 

56 56 56 


(24) 


f (l (r) di Pp(r)dip a{ r) , rtr) ( d<P(3(r) d<p a( r) , dtp fir) dip ^r) \ , y( r ) dip fir) 5^> a (r) \ „ 

M" % ~dT + 12 { 36 «6 + 96 56 ) + 622 3ft 56 j e 



11 


-) 9ip a (r) 

56 56 


, L (r) d< Pp(r) dip a (r) , ^r)dpp(r-)dip a (r)' 
I 0 22 — WT h 0 12 — JTT =: 

56 56 56 56 , 


d6 (25) 


where 

(*(()) = 5ff(x(0) + &’(*({)) , 6g(x«)) = S^WO) + &’(*(0) . 

4 *?(*(0) = -5S(xK)) . 
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For the numerical integration of the three terms in (25) we use the following three 
quadrature rules 


J v (0d£. = measA cr = 1,2,3. (26) 

with 

f 1 " = Q.O) . ( m = (o. 5 ) ■ and = Q. j) . (27) 

respectively. Obviously the quadrature rules in (26) are exact for constant functions v. 

The elements of the matrix K t L are computed in the same way. We can write the 
expressions for the computation of the matrix elements in the following formulation 


a {Pi 


U) 




/..I ■'i’l i 


re. 


,,<'U 

4-1 



= £ / (.4(.r)(jS)- T V f p>d(. r (0).(jS)- T V E p!' , (. r (?)))|detV l 


(r) 

l-l 


d( 


= £ £ / i(1 , (® ,r, ( j)v «^(0-V £v v..(0)dC 




{Ij) A-=l 

l-l 


(28) 


where again oh), = 1,2 6, are the local numbers of the nodes Pb) and P+, 

= jr : p| !) ^ 0 and ^0 on 6-i}’ ^ = u A-=i^ (t) ( see a l so Figure 1), and 


= 1 - 6 - 6 « 

¥>*(0 = 0 ' ^ 4(0 

Ps(0 = 6 , 


26 

in 

aw 

2 - 26 - 26 

in 

aw 

0 

in 

A< 3 ) 

. 1 - 26 

in 

A( 4 ) 


(29) 


<Fs(0 


0 

in 

af) 

26 

in 

A( 2 ) 

26 

in 

a (3) 

26+26-1 

in 

a (4) 


Pe(0 = 


26 

0 

2-26-26 
l - 26 


in 

in 

in 

in 


A< 2 > 

A< 3 ) 

A« 


To compute each integral over A (A 'i in (28) we use the equivalent formulation of 
type (25) and a quadrature rule of type (26). 

In the case of the p-hierarchical basis, we have to compute the entries of the 
matrix K®, i.e. expressions of the form (18), where stand for the functions 

pCuP/ii, i,j = 1, ■ ■ • , Ni-i, 9/-n ?f-n hi = N i-i + l,...,Ni. 

Again we get 


f (A( x )Vj,’\x),V x p l i , \x))dx= £ / (4(.r)V I p«( I ),V I p i ( ''(r))<ix, (30) 

•In v , 
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Z2 6 

::: 

' Xi a = 1 

Figure 1: An arbitrary triangle and the reference element A 

with cujj 7 ] from (20). After the transformation of the integrals over 8\ r \ into integrals 
over the reference element A we obtain the integrals 

if- (31) 

The functions t/’aM an(i « (r) , ^ = 1, 2, . . . , 6, are defined by 

MO = 1-6-6, MO =6, MO = 6 , 

MO = 4 6(i - 0 - 6) , MO = 4 66 . MO = 4 6(i - 6 - 6) • 

The integral (31) we write in the form (25). For the numerical integration of 

the resulting integrals over A we use quadrature rules, which we derive from the 

quadrature rules (26) by extrapolation. Specifically, we apply for the computation of 
the first, the second, and the third term the quadrature rules 

J^v{ Cjdi » |^Qu(e (4) ) + ^(e (5) ) + ^(C {6) )) -^(^ (1) )}ineasA (33) 

J A v (O d Z ~ {^(i U ^ (7) ) + i U (^ 8) ) + ^(^ 9) )) ~ ^(£ (2) )} mea sA (34) 

J A t>(o# ~ l + ^(0 n) ) + tM 6 12) )) - ^(0 3) )} meas A ( 35 ) 

with from (27) and 



= (i,o), 

0 5) 

ii 

•MW 

o 

0 6) 

= (- 4) 

V4» 2/ 

f (7) 

= (0,J), 

6 8) 

= (0,1), 

0 9) 

= (4 4) 

V 2 ’ 4 / 

e io) 

= (!,;)> 

^(ii) 

- (M), 

o i2) 

= f 1 4) 

V4’ 4> 


A simple calculation shows that the quadrature rules (33)-(35) are exact for quadratic 
functions. 
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Because of the smoothness of the coefficient functions a,y in (2) one can prove 
that the quadrature rules (26) and (33)— (35) lead for sufficiently small discretization 
parameters Ho a uniformly elliptic bilinear form a (., .). 

In the following we prove that the extrapolated stiffness matrix in (17) is equal 
to the stiffness matrix resulting from a discretization with p-hierarchical functions, 
where we assume that we use the quadrature rules (26) in case of the ^-hierarchical 
basis, and (33-35) in case of the p-hierarchical basis. 

Lemma 2. If we compute the element stiffness matrices Kfli, Kf, and K® as 
described above, i.e. by means of the quadrature rules (26) and (33) - (35), the relation 

I{ L,ex = r Q (37) 


holds. 

Proof: The proof is based on comparing the matrices Rp ex and I\f element by 
element. The extrapolated stiffness matrix and the matrix K® have the block 

structure 


I< 


L.ex 


±k l 

3-‘ y l,vv 


1 JfL 4 jy-L 

3 n l-l 3 A /,i 


1 rL 
3 1X l,rr 


1 K L 

3 l, mm 


K? 


K l 


rQ 

l,vm 


ry-Q j/Q 
xx l.mv x x l,mm 


(38) 


The entries of the stiffness matrix are computed using relations ( 18)— (23) 
and for the computation of the elements of the matrix Kf we use relations (28)-(29). 

First, we now prove the identity of the coarse mesh blocks = K® vv . Using 

the quadrature rules of type (26) and the representation (25) with a^ r \ /?h) = 1,2,3, 
the elements of the matrix Kf'ff are defined by 


jrL,ex,(ij) _ 
xy l,vv 


= £ 


r£u> 


(u) 




£«( 


t= 4 


56 


56 






t= 7 


56 


56 




5<P /9 (r)(^ (l) ) dq> Q (r){^) 

56 56 

5p /3(r) (£ (2) ) 5p a(r) (^ 2 )) 
56 56 


12 


+ i £ ^ 


f=10 


g (x(c m )) ^-)(^ t) )5p Q ( r) (^ w ) 

56 56 




56 


56 


with qf = lk = 73 = measA, q£ = Is = li = 7s = 7 i L 0 = lu = measAW, and 

qe = q? = 7 i l 2 = 2 meas & {k) - ^ 

For the entries of the matrix K® we get by using relations (30)-(32) and the 
quadrature rules (33)— (35) 
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If we examine the values of the partial derivatives d<p a (r>/d£i, dip a( r)/<9£ 2 , <9<p a ( r )/<9£ s , 
d4’ a (T)/d£i, dijt^/dh, d4' a (r)/d£ s , a (r) = 1,2,3, given in Table 1 and the relations 
measA = 4measAW, k = 1,2, 3, 4, = ^ a( o, a (r) = 1,2,3, we see that 

K&* m -Kgi m for,,i = 1.2 AT,.„ i.e. /*,■ = <«■ 

In the same way we can prove 

T-i.ex _ rrQ r -L,ex _ t-Q j r-L.ex _ r-Q 

- A l,vm ' A l,mv ~ A l,mv aIKl A l,mm ~ A l, mm • 

This completes the proof. | 

Next we discuss the computation of the entries of the load vectors. For the entries 
of the vector fj‘_ 1 we get 

/itf> = (Fj%) = £ L/(#)rf2,(*)*r- £ 

-M", Js '- 'Ml', 

(39) 

with = {r : ^0 on 

The integrals over A we compute by using the quadrature rule 

J a HO = Q u (°’ °) + °) + £ u ( 0 ’ !)) meas A • ( 40 ) 

Obviously, this formula is exact for linear functions v. 

The entries of the vector fj 1 are defined by 

/, M,) = ( f , ?!'*)= £ L/w#w*= £ / 4 /weK W (f)idet7, ( :>i<if 


. (■) 
r ^ w l-l 


r£w i-i 


Y Y J A(k) f( x (0)v a ^{0\ det J i ~ lK 




with = pj!_\ for i = l,...,iV/_i, p|^ = p/^ for i — iV;_i + 1,...,A^/ and the 

functions <p a ( r ) from (29). The integrals over A (fc) are computed by a formula of the 

type (40). 

In the case of the p-hierarchical basis, the entries of the load vector are given 

by 

/,°' (i) = = £ /,„/(*)#(*)<** = £ [f(<0)M()\ietJ!:\\d( 

(«) 

with p|9 = f or j _ ]. . ., At _ | . pj’> = Pp, for i = Ni-\ + and the 

functions i/’aM from (32). For the computation of the integrals over A we use the 

quadrature rule 

J a HO d t = (j^(£ (1) ) + \ v ^ (2) ) + ^ u (£ (3) )) meas A • ( 43 ) 
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with £( cr \ a — 1,2,3, from (27). This formula is exact for quadratic functions v. 

Lemma 3. If the load vectors ff_ lt ff, and f® are defined as described above, then 
the relation 

/f" = (44) 

holds. 

Proof: Using the relations (39), (41) and quadrature rules of type (40) for computing 
the extrapolated load vector f^' ex (see (17)) as well as relations (42) and (43) for 
computing the load vector /^, the proof follows immediately, g 

A consequence of Lemma 2 and Lemma 3 is the following Theorem. 

Theorem 4. If the extrapolated stiffness matrix K^' ex and the extrapolated load 
vector f^’ ex as well as the stiffness matrix K® and the load vector f® are computed 
as described above, then the systems of ajtgebraic FE equations 

K^ x m = fjf' ex and K?ui = £? (45) 

have the same solution. 

Now we can immediately prove the following convergence theorem for the algo- 
rithm MG-EX. 

Theorem 5. Under the assumption that the extrapolated stiffness matrix K^’ ex , 
the extrapolated load vector f*f’ ex , the stiffness matrix K® , and the load vector f® are 
computed as discussed in this section, the following statements hold. 

(i) The iterates of algorithm MG-EX converge to a FE solution which has the 
same discretization error as a FE solution obtained by a FE discretization with 
p-hierarchical functions. 

(ii) The convergence rate of algorithm MG-EX does not depend on the discretization 
parameter. 


Proof: The statement (i) follows from the interpretation of algorithm MG-EX as 
a usual multigrid algorithm for solving the system of algebraic equation Kf'^Uj = 
ff ' ex and the equivalence of the systems of algebraic equations I\[ / ' ex u l = ff' ex and 

Statement (ii) we can prove in an analogous way as done for the piecewise constant 
coefficient case in [8]. g 

Remark: We can also formulate algorithm MG-EX in terms of a piecewise linear 
nodal basis. All our results are also valid in this case. 
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Numerical results 


In this Section we want to confirm our theoretical results by a numerical example. 
We will illustrate that the iterates of the algorithm MG-EX converge to the FE so- 
lution which we would obtain by a discretization of problem (1) with p-hierarchical 
functions. Furthermore, the numerical example shows that the convergence rate of 
algorithm MG-EX is independent of the discretization parameter. 

All algorithms have been implemented within the multigrid package FEMGP [10]. 
The computations were performed on a PC 80486 (33 MHz) using the LAHEY- 
Fortran compiler. 

Let us consider the problem (1), where = (0, 1) x (0,1), 

A — ^ ^ ^ , a n (x) = (1.1— tanh(3xi+3a)2— 4.5)), and a 22 (x) = 2 a u (x). 

The right-hand side f(x ) is chosen such that the function 

u(x) = £i(l — £ 1 ) 22(1 — £2)(1 + tanh(3£i + 3x 2 — 4.5)) 
is the exact solution of problem (1). 

Starting from the coarsest triangulation T\ (see Figure 2) the finer triangulations 
have been generated by dividing all triangles of the triangulation Tk, k = 1, 2, . . . , l— 1, 
into four smaller congruent sub-triangles. In Table 2 we give the numbers of nodes 
and the numbers of triangles in each triangulation. 



Figure 2: Mesh 7) and iso-lines of the solution u 

For Algorithm MG-EX we used as pre-smoother two sweeps of the lexicograph- 
ically forward Gauss-Seidel method for solving system (9), one iteration step of a 
(Z — l)-grid algorithm for solving the coarse-grid system (12), and two sweeps of the 
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triangulation 

7i 

t 2 

% 

t 4 

t 5 

number of nodes 

78 

281 

1065 

4145 

16353 

number of triangles 

126 

504 

2016 

8064 

32256 


Table 2: Number of nodes and number of triangles in 77, k = 1, 2, . . . , 5 


lexicographically backward Gauss-Seidel method in the post-smoothing step. The 
initial guess was obtained by a full multigrid strategy. On the levels k = 1,2 , — 1 
a usual multigrid algorithm for solving the corresponding FE equations in the linear 
nodal basis was performed. Within this fc-grid algorithms one E-cycle with two 
Gauss-Seidel sweeps lexicographically forward in the pre-smoothing step and two 
Gauss-Seidel sweeps lexicographically backward in the post-smoothing step were 
used. The convergence criterion for MG-EX was 

||/f’“ - K{ , ' ex t ^ k+1 ' 0) || < 1(T 4 \\ff e3r - Kf’ ex u\ 0,0) \\ , (46) 

where || . || denotes the Euclidean norm in the space R^', and is the initial guess. 

In Table 3 we present the number of iterations and the CPU-time needed by the 
application of the algorithm MG-EX. An improvement of the convergence behav- 
ior of our algorithm we obtain by introducing additional pre-smoothing and post- 
smoothing steps, i.e. before step 1 in the algorithm MG-EX we perform one iteration 
step of the Gauss-Seidel method lexicographically forward and after step 3 one itera- 
tion step of the Gauss-Seidel method lexicographically backward applied to the system 
of algebraic equations Kf’ ex uf' ex = ff' ex ■ This is illustrated in column MG-EX(l) of 
Table 3. 


l 

Algorithm MG^EX 

Algorithm MG-EX(l) 

number of 

CPU-time 

number of 

CPU-time 



iterations 

iterations 

3 

11 

2.06 sec 

6 

1.54 sec 

4 

11 

9.61 sec 

5 

6.28 sec 

5 

12 

45.77 sec 

5 

27.54 sec 


Table 3: Comparison of the algorithm MG-EX and the algorithm MG-EX(l) 

Finally, we compare the discretization errors \\u — uf’ ex \\ and ||u — uf || in the H 1 - 
norm and in the L 2 -norm. Here uf’ ex denotes the FE solution obtained by means 
of the algorithm MG-EX and uf the FE solution by a discretization with piecewise 
p-hierarchical functions. 

Table 4 shows that the algorithm MG-EX yields discretization errors which are 
typical for discretizations with piecewise quadratic functions, i.e. we can observe an 
error of order O(hf) in the hU-norm and 0(hf ) in the L 2 _ norm. 
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Level l 

ii" - 

h-ufn\L 2 

\\u-u?\\ H i 

11“ -“?l \l 2 

3 

0.5038-03 

0.4353-05 

0.5038-03 

0.4354-05 

4 

0.1246-03 

0.5269-06 

0.1246-03 

0.5269-06 

5 

0.3101-04 

0.5861-07 

0.3101-04 

0.5858-07 


Table 4: Comparison of the discretization errors 


Conclusion 

In this paper we have presented the analysis of an algorithm which can alge- 
braically be understood as multigrid with r-extrapolation. In practice, this algorithm 
is simple to implement, once a multigrid algorithm is available. However, we have 
shown that the algorithm converges to the same solution as a higher order fine element 
discretization. The algorithm can thus be used on unstructured' meshes in an adap- 
tive refinement setting. Furthermore, it is independent of global error expansions, 
and can thus be applied locally. 
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